How is the "load average" interpreted in "top" output? Is it the same for all distributions?

The CPU load is the length of the run queue, i.e. the length of the queue of processes waiting to be run.

The uptime command may be used to see the average length of the run queue over the last minute, the last five minutes, and the last 15 minutes, just like what's usually displayed by top.

A high load value means the run queue is long. A low value means that it is short. So, if the one minute load average is 0.05, it means that on average during that minute, there was 0.05 processes waiting to run in the run queue. It is not a percentage. This is, AFAIK, the same on all Unices (although some Unices may not count processes waiting for I/O, which I think Linux does; OpenBSD, for a while only, also counted kernel threads, so that the load was always 1 or more).

The Linux top utility gets the load values from the kernel, which writes them to /proc/loadavg. Looking at the sources for procps-3.2.8, we see that:

To display the load averages, the sprint_uptime() function is called in top.c.
This function lives in proc/whattime.c and calls loadavg() in proc/sysinfo.c.
That function simply opens LOADAVG_FILE to read the load averages.
LOADAVG_FILE is defined earlier as "/proc/loadavg".

The load average is typically calculated by the kernel. Applications such as top and uptime may use the getloadavg(3) library call to access this (it's meant to be portable across different Unix versions). On Linux this typically results in a read from /proc/loadavg. On FreeBSD it's a system call.

For example:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main()
{
  double ld[3];

  int i=getloadavg(ld,3);
  printf("Load %.2f %.2f %.2f\n",ld[0],ld[1],ld[2]);
}

uptime and top both make similar calls to get their data.

Now the 1/5/15 minute load averages are the number of processes on the run queue. Different operating systems may calculate this in different ways; the biggest difference normally is whether processes waiting for I/O (eg blocked on disk) count as runnable or not. On Linux they are.

So a load average of 3.4 means there was an average of 3.4 processes on the run queue within the sample window (1, 5, 15 minutes).

A high load average doesn't necessarily mean an overloaded server, though. If you have 16 cores then your load average can be 16 without stress. You could also have a lot of application fork() calls which can result in a large number of processes being created/destroyed, leading to a high load average, but without massively impacting server performance. It should only be used as a guide, along with other metrics such as %CPU busy.

The load average is not something that is specific to any particular tool or distribution, it's a measurement provided by the Kernel, or more precisely, the scheduler, therefore it's a distribution independent measurement. The measurement is recorded inside the proc filesystem /proc

Onto it's interpretation,the load average metric is not an indication of how hard the CPU is working but how much work needs to be done. I don't think there is really a need to multiply it by anything because it's a direct measurement of the number of processes in either a runnable or uninterruptible state.

Try checking out the following two man pages: getloadavg(3) and uptime for more information.

The load average metric can be a difficult concept to understand at first, I think a lot of people think it's an indication of how hard the CPU is working, but that's not really it.

How is the "load average" interpreted in "top" output? Is it the same for all distributions?

Tags:

Rhel

Top

Load Average

Related

Recent Posts