Why do different 100% cpu loads cause different temperatures on the cpu?
CPU usage is a measure of how much resource the CPU has available, but there are many different types of instruction that can be processed and they all have different processing and memory requirements.
A task that is memory intensive may cause the CPU to stall while it fetches data from memory and so reduce the effective instruction throughput while still having the CPU "in use".
Also there are many different parts of the CPU that may be saturated differently.
From Wikichips Sandy Bridge uArch:
You can see we have an initial instruction decoder frontend, which for complex and diverse instruction streams might struggle to keep the rest of the pipeline full.
If you only have integer additions then you will be able to use 3 of the core execution units, as the CPU has 3 INT ALU units. If you only have floating point multiplications then you may only use the single FPU MUL (multiply) unit.
The CPU also operates as a pipeline, and while one unit is in use in an execution unit, you may be able to schedule an operation in the next cycle. This means a diverse instruction stream can make better use of resource as a unit not in use can be scheduled in the same EU, but with a different instruction type. Different instructions will also have different execution times and have larger or smaller set of associated circuitry to execute. A simple addition may take one or two clock cycles, while a floating point instruction might take longer and have a larger amount of circuitry involved. Taking longer might mean it uses more power, as might the larger area of circuitry. Alternatively the instruction taking longer might mean that the front end scheduling circuitry pauses and briefly uses less power while it waits for available execution units, while the smaller faster instructions use more overall circuitry if you include other parts of the CPU.
As a result to make full use of the CPU you need a diverse instruction stream, and what may exercise one CPU might not fully exercise another due to different arrangement and number of execution units and their capabilities.
Execution units can go "low power" with modern power gating method and as a result not contribute to the heat output of the device, or contribute a lot less.
Caches also contribute to power consumption. Using the cache will mean that instructions and data can be fetched and, as a result, executed faster than a routine with a data set in memory that is too large for a cache.
As a result different programs or instruction streams may cause different peak power usage and so different temperatures.
Architectural differences across processor generations, and even in the same generation where cache sizes, processor options and different instruction availability may have an effect.
I understand that you wish to know why running a multi-thread crunch test does not heat up the CPU as much as a mono-thread test.
The simple explanation is that Turbo boost is to blame, as it is disabled when the CPU is working equally hard on multiple cores. It is only enabled when one core is heavily used (and only one core).
When Turbo boost is active, it shunts more power to the boosted core, reducing the power to the other cores and thus slowing them down.
The boosted core then runs at a higher speed and would heat up more than a non-boosted core. This is captured by the sensor, which would then report that one core's temperature as that of the entire CPU.
The CPU "load" (or usage) is an activity monitor to indicate what percentage of CPU time is spent on "useful" activity versus "idle" time. The operating system determines what is "useful" activity and what is "idle" time.
At zero per cent CPU load, the OS is not scheduling any user processes during that time interval.
At 50% CPU load, the OS is has scheduled about half the time interval for user processes, and the other half of that time interval was spent in the idle loop.
Even if there is just one user process, it may not be able to consume 100% of the CPU load because that process is not CPU intensive and must be rescheduled while, for example, waiting for an I/O operation to complete.
At 100% CPU load, the OS has scheduled all of the time interval to user processes.
Note that the CPU is actually always busy (when powered up), that is, always executing instructions. If no (user) process is ready to execute, then the OS scheduler must perform its idle loop.
The CPU temperature is a consequence of the electrical power consumed by the CPU circuits. As more transistor switches occur, then more power is required and consumed, and the CPU temperature increases.
This power consumption is not indicated by the CPU "load", which is merely a time-based activity monitor.
A process can keep the CPU "busy" (time-wise) by simply copying or moving data (e.g. load and store instructions) around in memory (which is not a significant additional power load above idle).
Whereas another computationally-intensive process could perform calculations (e.g. multiply and divide instructions) that utilize many other circuits in the CPU such as the ALU (arithmetic/logic unit) and FPU (floating point unit).
IOW it is the instruction mix (i.e. the types of instructions) that the process executes that determines the electrical power consumed and the subsequent temperature level.
The OS is not capable of measuring this power consumption, and only reports a time-based activity measurement using the CPU load and temperature sensors.