Why Hyper-Threading provides 2 virtual cores but not more?
The manual Intel Hyper-Threading Technology Technical User’s Guide contains some hints about why Intel did not try for more than two threads per core in its consumer CPUs, which it did do in some server CPUs.
When explaining Hyper-Threading Technology, it says :
Each logical processor
- Has its own architecture state
- Executes its own code stream concurrently
- Can be interrupted and halted independently
The two logical processors share the same
- Execution engine and the caches
- Firmware and system bus interface
The important part is that the two logical processors share the same Execution engine, meaning that the units that make up the core are not duplicated. Once, for example, the arithmetic unit is used by one thread, it cannot be used by the other thread. This prevents total parallelism, so does not allow two threads to execute in parallel instructions of the same type - one has to wait for the other to finish.
Intel has quantified the average performance gain by threads as follows:
A processor with Hyper-Threading Technology may provide a performance gain of 30 percent when executing multi-threaded operating system and application code over that of a comparable Intel architecture processor without Hyper-Threading Technology.
The statistical gain of two threads versus one is therefore only in the order of 30%, which is very far from the 100% that one would expect if two threads on the same core could do double the work of one.
I would therefore estimate that if Intel would have of enabled, say, three threads on the core, the statistical gain would be much lower, maybe on the order of 10% or less.
Given the fact that some hardware needs to be duplicated per each thread, namely the architecture state and interrupt logic, the gain is probably not worth the cost that this additional hardware would add to the price of the core.
For effective Hyper-Threading, Intel would have had to increase the number of units of the same type inside each core. It has done just that in the Haswell microarchitecture which has 4 ports for load/stores, 4 for integer, and 2 for branch, so even two threads running identical integer workloads probably wouldn't introduce much contention. However, Intel has still kept to the model of two hyper-threads per core, I would guess probably in order to economize on the hardware needed to allow more hyper-threads, or maybe even also because modern operating systems cannot actually use efficiently such an architecture.
This discussion has been raised at ServerFault SE: How many CPUs should be utilised with Hyperthreading?
when does having two different threads cause one to run worse?
Although dependencies between instructions won't change, there is a new problem born - competition. These different threads compete over memory access, both in cache utilization and bandwidth, which is somewhat counterproductive.
The logical core can’t do much, but it does provide a little increased parallelism. It is far from being a real core. In fact, it offers approx 30% the performance of a real physical core.
Its purpose was simply to increase parallelism in a world dominated by I/O bound (non-CPU intensive) processes. When a CPU intensive (CPU bound) thread is switched to one of these cores, its performance will substantially degrade.
Now imagine a scenario such thread is tossed around multiple logical cores, this can subsequently degrade perfomance.
Source: When HyperThreading Hurts