Understanding load average – A Practitioner Guide

The term “load average” is used in many Linux/UNIX utilities. Everybody knows that the numbers the term “load average” refers to, usually three numbers, somehow represent the load on the system’s CPU. In this post I’ll try making this three numbers clearer and understandable.

The easiest way to see the “load average” of your system is by uptime.It also appears in top and can be graphed in the console by tload . In all three cases the load average refers to a group of three numbers. For example, in the following output of uptime

10:41:47 up 5 days, 48 min,  1 user,  load average: 0.82, 0.71, 0.66

the last three numbers are the “load average”. Each number represent the systems load as a moving average over 1, 5 and 15 minutes respectively. Now, the important thing is to understand what is being averaged, the load metric.

The metric that represent the load at a given point in time is how many process are queued for running at each given time (including the process that is currently being ran). Generally speaking, on a single core machine, this can be looked at as CPU utilization percentage when multiplied by 100. For example if I had a load-average of 0.50 in the last minute, this means that over the last minute half of the time the CPU was idle as it had no running process. On the other hand if I had load average of 2.50 it means that over the last minute an average of 1.5 process were waiting to their turn to run. so the CPU was overloaded by 150%.

On a multi-core systems things are a bit different, but in order to avoid unnecessary complications one can usually divide the load-average by the number of cores an treat the result as the load average of single core machine. For example let’s say the load average of a two-core machine was 3.00 2.00 0.50. This means that over the last minute we had an average of three runnable process, this means that one process, in average, was queued as there are two core in the machine that can run to process at a time. So the machine was overloaded had a load of 150% its capability. Over the last 5 minutes the load average of 2.00 means that we roughly had 2 process running each time, so the machine was fully utilized but wasn’t overloaded by work. On the other hand over last 15 minutes the load-average of 0.50 means that we could handle 4 time that load without overloading the CPU, we only had (0.50/2)*100=25% CPU utilization in that 15 minutes.

I hope I made the load-average a bit more clearer using the above example. Load-average is an important metric for measuring a system performance, and good understanding of it is beneficial.

7 thoughts on “Understanding load average – A Practitioner Guide”

Leave a Reply

Your email address will not be published. Required fields are marked *