Understanding load average – A Practitioner’s Guide

The term “load average” is used in many Linux/UNIX utilities. Everybody knows that the numbers the term “load average” refers to, usually three numbers, somehow represent the load on the system’s CPU. In this post I’ll try to make these three numbers clearer and more understandable.

The easiest way to see the “load average” of your system is by using uptime. It also appears in top and can be graphed in the console by tload. In all three cases, the load average refers to a group of three numbers. For example, in the following output of uptime

10:41:47 up 5 days, 48 min,  1 user,  load average: 0.82, 0.71, 0.66

the last three numbers are the “load average”. Each number represents the system’s load as a moving average over 1, 5, and 15 minutes, respectively. Now, the important thing is to understand what is being averaged: the load metric.

The metric that represents the load at a given point in time is how many processes are queued for running at any given time (including the process that is currently being run). Generally speaking, on a single-core machine, this can be looked at as CPU utilization percentage when multiplied by 100. For example, if I had a load average of 0.50 in the last minute, this means that over the last minute half of the time the CPU was idle, as it had no running process. On the other hand, if I had a load average of 2.50, it means that over the last minute an average of 1.5 processes were waiting for their turn to run, so the CPU was overloaded by 150%.

On a multi-core system things are a bit different, but in order to avoid unnecessary complications, one can usually divide the load average by the number of cores and treat the result as the load average of a single-core machine. For example, let’s say the load average of a two-core machine was 3.00 2.00 0.50. This means that over the last minute we had an average of three runnable processes; this means that one process, on average, was queued, as there are two cores in the machine that can run two processes at a time. So the machine was overloaded and had a load of 150% of its capability. Over the last 5 minutes, the load average of 2.00 means that we roughly had 2 processes running each time, so the machine was fully utilized but wasn’t overloaded with work. On the other hand, over the last 15 minutes, the load average of 0.50 means that we could handle 4 times that load without overloading the CPU; we only had (0.50/2)*100=25% CPU utilization in that 15 minutes.

I hope I made the load average a bit clearer using the above example. Load average is an important metric for measuring a system’s performance, and a good understanding of it is beneficial.

7 thoughts on “Understanding load average – A Practitioner’s Guide”

yeap you did! Specially on muti-core part…
better then: http://en.wikipedia.org/wiki/Load_(computing)

I will copy some info about load average to my help guide for clusters.

Hey! Great article.Appreciate the effort to made it simple and clear.

Was a good read man! That was helpful.

Pingback: Problems with hosting in Ireland

Perfect explanation. Thanks

Finally a guide to load average that actually makes sense

Share this:

7 thoughts on “Understanding load average – A Practitioner’s Guide”

Leave a Reply