Proper way of interpreting system load on a 4 core 8 thread processor

Solution 1:

Not surely, but mostly on 1.00*n_cpu.

The load means the following: if there are multiple processes on a single-cpu system, they are running seemingly parallel. But it is not true. What practically happens: the kernel gives 1/100th second to a process, and then breaks its running with an interrupt. And gives the next 1/100th second to another process.

Practically the question, "which process should get our next 1/100th seconds interval?", will be decided by a complex heuristics. It is named as task scheduling.

Of course, processes which are blocked, for example they are waiting their data what they are reading from the disk, are exempt from this task scheduling.

What load says: how many processes are currently waiting their next 1/100th second time frame. Of course, it is a mean value. This is because you can see multiple numbers in a cat /proc/loadavg.

The situation in a multi-cpu system is a little bit complexer. There are multiple cpus, whose time frames can be given to multiple processes. That makes the task scheduling a little bit - but not too much - complexer. But the situation is the same.

The kernel are intelligent, it tries to share the system resources for the optimal efficiency, and it is in the near of that (there are minor optimization things, for example it is better if a process will be runned the longest possible time on the same cpu because of caching considerations, but they doesn't matter there). This is because if we have load 8, that means: there are actually 8 processes waiting for their next time slice. If we have 8 cpus, we can give these time slices to the cpus one-to-one, and thus our system will optimally used.

If you see a top, you can see that the number of the actual running processes is surprisingly low: they are the processes marked by R there. Even on a not really hardcore system is it often below 5. This is partially because the processes waiting their data from the disks or from the network are also suspended (marked with S in top). The load shows only the cpu usage.

There are tools to measure the disk load as well, imho they should be at least important as the cpu usage monitoring, but somehow it isn't so well known here in our professional sysadmin world.


Windows tools are often dividing the load with the actual number of the cpus. This causes some professional windows system administrator to use the system load in this divided-by-cpu sense. They haven't right and will be probably happier after you explain this to them.


Multicore CPUs are practically multiple CPUs on the same silicon chip. There is no difference.

In case of hyperthreaded CPUs there is an interesting side effect: loading a cpu makes its hyperthreaded pairs slower. But this happens on a deeper layer what the normal task scheduling handles, although it can (and should) influence the process-moving decisions of the scheduler.

But from our current viewpoint - what determines the system load - it doesn't matter as well.

Solution 2:

Load average doesn't mean what you think it means. It's not about instant CPU usage, but rather how many processes are waiting to run. Usually that's because of lots of things wanting CPU, but not always. A common culprit is a process waiting for IO - disk or network.

Try running ps -e v and looking for process state flags.

state    The state is given by a sequence of characters, for example, "RWNA". The      first character indicates the run state of the process:
D    Marks a process in disk (or other short term, uninterruptible) wait.
I    Marks a process that is idle (sleeping for longer than about 20 seconds).  
L    Marks a process that is waiting to acquire a lock.
R    Marks a runnable process.
S    Marks a process that is sleeping for less than about 20 seconds.
T    Marks a stopped process.
W    Marks an idle interrupt thread.
Z    Marks a dead process (a "zombie").

This is from the ps manpage, so you an find more detail there - R and D processes are probably of particular interest.

You can end up with load average 'spikes' for all sorts of reasons, so they're not really a good measure of anything other than 'is this system busy-ish'. Getting bogged down in mapping load average to CPU cores isn't going to do you any good.


Solution 3:

As hyperthreading is not actually a 2nd core it will never take a core to 200% but it will take it beyond 100% for certain workloads.

So your maximum load is somewhere unknown between approx 4 and 6

(of course this can go up higher when overloaded because it actually counts runnable processes, particularly when they are waiting for IO)


Solution 4:

On a Linux system not just the processes in the runnable queue are counted up to calculate load but also those in uninterruptible sleep states , wikipedia, causing the load to spike when you have lots of processes waiting for disk.


Solution 5:

I did some experiments on our 24-core Xeon system (2 socket x 12 cores). The maximum load is 48.0 in this case due to the way Linux sets up hyperthreading.

However, you don't get the equivalent of 48 cores of throughput. What I have observed is that you get about 90% of the throughput in the first 24 logical processors, i.e. if the load runs to 24.0. Then you get an additional throughput of about 10% for the remaining 24 logical processors (load runs to 48.0). Another way of thinking about it is is that if you run 48 threads on the 24 cores, you will get a boost of about 10-20% if you enable hyperthreading versus not. It's not 100% boost like the marketing guys would imply.

For example, one way of testing this observation is have a process that runs 48 threads (say using TBB or handrolled threading model), then run

time numactl --physcpubind=0-23  ./myprocess

and then run

time numactl --physcpubind=0-47  ./myprocess

The latter should run in about 10-20% less time. If your process is highly I/O blocked, then the result might be different.

The former will disable hyperthreading by only allowing the threads to run on a single logical processor (of each core), while the latter will enable hyperthreading by allowing the threads to run on 2 logical processors (of each core).

The load in both cases should be reported as 48.0 ... which as you can see is very misleading.