Why are some Intel family 6 CPU models (Core 2, Pentium M) not supported by intel_idle?
While researching Core 2 CPU power states ("C-states"), I actually managed to implement support for most of the legacy Intel Core/Core 2 processors. The complete implementation (Linux patch) with all of the background information is documented here.
As I accumulated more information about these processors, it started to become apparent that the C-states supported in the Core 2 model(s) are far more complex than those in both earlier and later processors. These are known as Enhanced C-states (or "CxE"), which involve the package, individual cores and other components on the chipset (e.g., memory). At the time the intel_idle
driver was released, the code was not particularly mature and several Core 2 processors had been released that had conflicting C-state support.
Some compelling information on Core 2 Solo/Duo C-state support was found in this article from 2006. This is in relation to support on Windows, however it does indicate the robust hardware C-state support on these processors. The information regarding Kentsfield conflicts with the actual model number, so I believe they are actually referring to a Yorkfield below:
...the quad-core Intel Core 2 Extreme (Kentsfield) processor supports all five performance and power saving technologies — Enhanced Intel SpeedStep (EIST), Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2), old On-Demand Clock Modulation (ODCM), as well as Enhanced C States (CxE). Compared to Intel Pentium 4 and Pentium D 600, 800, and 900 processors, which are characterized only by Enhanced Halt (C1) State, this function has been expanded in Intel Core 2 processors (as well as Intel Core Solo/Duo processors) for all possible idle states of a processor, including Stop Grant (C2), Deep Sleep (C3), and Deeper Sleep (C4).
This article from 2008 outlines support for per-core C-states on multi-core Intel processors, including Core 2 Duo and Core 2 Quad (additional helpful background reading was found in this white paper from Dell):
A core C-state is a hardware C-state. There are several core idle states, e.g. CC1 and CC3. As we know, a modern state of the art processor has multiple cores, such as the recently released Core Duo T5000/T7000 mobile processors, known as Penryn in some circles. What we used to think of as a CPU / processor, actually has multiple general purpose CPUs in side of it. The Intel Core Duo has 2 cores in the processor chip. The Intel Core-2 Quad has 4 such cores per processor chip. Each of these cores has its own idle state. This makes sense as one core might be idle while another is hard at work on a thread. So a core C-state is the idle state of one of those cores.
I found a 2010 presentation from Intel that provides some additional background about the intel_idle
driver, but unfortunately does not explain the lack of support for Core 2:
This EXPERIMENTAL driver supersedes acpi_idle on Intel Atom Processors, Intel Core i3/i5/i7 Processors and associated Intel Xeon processors. It does not support the Intel Core2 processor or earlier.
The above presentation does indicate that the intel_idle
driver is an implementation of the "menu" CPU governor, which has an impact on Linux kernel configuration (i.e., CONFIG_CPU_IDLE_GOV_LADDER
vs. CONFIG_CPU_IDLE_GOV_MENU
). The differences between the ladder and menu governors are succinctly described in this answer.
Dell has a helpful article that lists C-state C0 to C6 compatibility:
Modes C1 to C3 work by basically cutting clock signals used inside the CPU, while modes C4 to C6 work by reducing the CPU voltage. "Enhanced" modes can do both at the same time.
Mode Name CPUs
C0 Operating State All CPUs
C1 Halt 486DX4 and above
C1E Enhanced Halt All socket LGA775 CPUs
C1E — Turion 64, 65-nm Athlon X2 and Phenom CPUs
C2 Stop Grant 486DX4 and above
C2 Stop Clock Only 486DX4, Pentium, Pentium MMX, K5, K6, K6-2, K6-III
C2E Extended Stop Grant Core 2 Duo and above (Intel only)
C3 Sleep Pentium II, Athlon and above, but not on Core 2 Duo E4000 and E6000
C3 Deep Sleep Pentium II and above, but not on Core 2 Duo E4000 and E6000; Turion 64
C3 AltVID AMD Turion 64
C4 Deeper Sleep Pentium M and above, but not on Core 2 Duo E4000 and E6000 series; AMD Turion 64
C4E/C5 Enhanced Deeper Sleep Core Solo, Core Duo and 45-nm mobile Core 2 Duo only
C6 Deep Power Down 45-nm mobile Core 2 Duo only
From this table (which I later found to be incorrect in some cases), it appears that there were a variety of differences in C-state support with the Core 2 processors (Note that nearly all Core 2 processors are Socket LGA775, except for Core 2 Solo SU3500, which is Socket BGA956 and Merom/Penryn processors. "Intel Core" Solo/Duo processors are one of Socket PBGA479 or PPGA478).
An additional exception to the table was found in this article:
Intel’s Core 2 Duo E8500 supports C-states C2 and C4, while the Core 2 Extreme QX9650 does not.
Interestingly, the QX9650 is a Yorkfield processor (Intel family 6, model 23, stepping 6). For reference, my Q9550S is Intel family 6, model 23 (0x17), stepping 10, which supposedly supports C-state C4 (confirmed through experimentation). Additionally, the Core 2 Solo U3500 has an identical CPUID (family, model, stepping) to the Q9550S but is available in a non-LGA775 socket, which confounds interpretation of the above table.
Clearly, the CPUID must be used at least down to the stepping in order to identify C-state support for this model of processor, and in some cases that may be insufficient (undetermined at this time).
The method signature for assigning CPU idle information is:
#define ICPU(model, cpu) \
{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&cpu }
Where model
is enumerated in asm/intel-family.h. Examining this header file, I see that Intel CPUs are assigned 8-bit identifiers that appear to match the Intel family 6 model numbers:
#define INTEL_FAM6_CORE2_PENRYN 0x17
From the above, we have Intel Family 6, Model 23 (0x17) defined as INTEL_FAM6_CORE2_PENRYN
. This should be sufficient for defining idle states for most of the Model 23 processors, but could potentially cause issues with QX9650 as noted above.
So, minimally, each group of processors that has a distinct C-state set would need to be defined in this list.
Zagacki and Ponnala, Intel Technology Journal 12(3):219-227, 2008 indicate that Yorkfield processors do indeed support C2 and C4. They also seem to indicate that the ACPI 3.0a specification supports transitions only between C-states C0, C1, C2 and C3, which I presume may also limit the Linux acpi_idle
driver to transitions between that limited set of C-states. However, this article indicates that may not always be the case:
Bear in mind that is the ACPI C state, not the processor one, so ACPI C3 might be HW C6, etc.
Also of note:
Beyond the processor itself, since C4 is a synchronized effort between major silicon components in the platform, the Intel Q45 Express Chipset achieves a 28-percent power improvement.
The chipset I'm using is indeed an Intel Q45 Express Chipset.
The Intel documentation on MWAIT states is terse but confirms the BIOS-specific ACPI behavior:
The processor-specific C-states defined in MWAIT extensions can map to ACPI defined C-state types (C0, C1, C2, C3). The mapping relationship depends on the definition of a C-state by processor implementation and is exposed to OSPM by the BIOS using the ACPI defined _CST table.
My interpretation of the above table (combined with a table from Wikipedia, asm/intel-family.h and the above articles) is:
Model 9 0x09 (Pentium M and Celeron M):
- Banias: C0, C1, C2, C3, C4
Model 13 0x0D (Pentium M and Celeron M):
- Dothan, Stealey: C0, C1, C2, C3, C4
Model 14 0x0E INTEL_FAM6_CORE_YONAH (Enhanced Pentium M, Enhanced Celeron M or Intel Core):
- Yonah (Core Solo, Core Duo): C0, C1, C2, C3, C4, C4E/C5
Model 15 0x0F INTEL_FAM6_CORE2_MEROM (some Core 2 and Pentium Dual-Core):
- Kentsfield, Merom, Conroe, Allendale (E2xxx/E4xxx and Core 2 Duo E6xxx, T7xxxx/T8xxxx, Core 2 Extreme QX6xxx, Core 2 Quad Q6xxx): C0, C1, C1E, C2, C2E
Model 23 0x17 INTEL_FAM6_CORE2_PENRYN (Core 2):
- Merom-L/Penryn-L: ?
- Penryn (Core 2 Duo 45-nm mobile): C0, C1, C1E, C2, C2E, C3, C4, C4E/C5, C6
- Yorkfield (Core 2 Extreme QX9650): C0, C1, C1E, C2E?, C3
- Wolfdale/Yorkfield (Core 2 Quad, C2Q Xeon, Core 2 Duo E5xxx/E7xxx/E8xxx, Pentium Dual-Core E6xxx, Celeron Dual-Core): C0, C1, C1E, C2, C2E, C3, C4
From the amount of diversity in C-state support within just the Core 2 line of processors, it appears that a lack of consistent support for C-states may have been the reason for not attempting to fully support them via the intel_idle
driver. I would like to fully complete the above list for the entire Core 2 line.
This is not really a satisfying answer, because it makes me wonder how much unnecessary power is used and excess heat has been (and still is) generated by not fully utilizing the robust power-saving MWAIT C-states on these processors.
Chattopadhyay et al. 2018, Energy Efficient High Performance Processors: Recent Approaches for Designing Green High Performance Computing is worth noting for the specific behavior I'm looking for in the Q45 Express Chipset:
Package C-state (PC0-PC10) - When the compute domains, Core and Graphics (GPU) are idle, the processor has an opportunity for additional power savings at uncore and platform levels, for example, flushing the LLC and power-gating the memory controller and DRAM IO, and at some state, the whole processor can be turned off while its state is preserved on always-on power domain.
As a test, I inserted the following at linux/drivers/idle/intel_idle.c line 127:
static struct cpuidle_state conroe_cstates[] = {
{
.name = "C1",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00),
.exit_latency = 3,
.target_residency = 6,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C1E",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01),
.exit_latency = 10,
.target_residency = 20,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
// {
// .name = "C2",
// .desc = "MWAIT 0x10",
// .flags = MWAIT2flg(0x10),
// .exit_latency = 20,
// .target_residency = 40,
// .enter = &intel_idle,
// .enter_s2idle = intel_idle_s2idle, },
{
.name = "C2E",
.desc = "MWAIT 0x11",
.flags = MWAIT2flg(0x11),
.exit_latency = 40,
.target_residency = 100,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.enter = NULL }
};
static struct cpuidle_state core2_cstates[] = {
{
.name = "C1",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00),
.exit_latency = 3,
.target_residency = 6,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C1E",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01),
.exit_latency = 10,
.target_residency = 20,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C2",
.desc = "MWAIT 0x10",
.flags = MWAIT2flg(0x10),
.exit_latency = 20,
.target_residency = 40,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C2E",
.desc = "MWAIT 0x11",
.flags = MWAIT2flg(0x11),
.exit_latency = 40,
.target_residency = 100,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C3",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 85,
.target_residency = 200,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C4",
.desc = "MWAIT 0x30",
.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 100,
.target_residency = 400,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C4E",
.desc = "MWAIT 0x31",
.flags = MWAIT2flg(0x31) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 100,
.target_residency = 400,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C6",
.desc = "MWAIT 0x40",
.flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 200,
.target_residency = 800,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.enter = NULL }
};
at intel_idle.c
line 983:
static const struct idle_cpu idle_cpu_conroe = {
.state_table = conroe_cstates,
.disable_promotion_to_c1e = false,
};
static const struct idle_cpu idle_cpu_core2 = {
.state_table = core2_cstates,
.disable_promotion_to_c1e = false,
};
at intel_idle.c
line 1073:
ICPU(INTEL_FAM6_CORE2_MEROM, idle_cpu_conroe),
ICPU(INTEL_FAM6_CORE2_PENRYN, idle_cpu_core2),
After a quick compile and reboot of my PXE nodes, dmesg
now shows:
[ 0.019845] cpuidle: using governor menu
[ 0.515785] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 0.543404] intel_idle: MWAIT substates: 0x22220
[ 0.543405] intel_idle: v0.4.1 model 0x17
[ 0.543413] tsc: Marking TSC unstable due to TSC halts in idle states deeper than C2
[ 0.543680] intel_idle: lapic_timer_reliable_states 0x2
And now PowerTOP is showing:
Package | CPU 0
POLL 2.5% | POLL 0.0% 0.0 ms
C1E 2.9% | C1E 5.0% 22.4 ms
C2 0.4% | C2 0.2% 0.2 ms
C3 2.1% | C3 1.9% 0.5 ms
C4E 89.9% | C4E 92.6% 66.5 ms
| CPU 1
| POLL 10.0% 400.8 ms
| C1E 5.1% 6.4 ms
| C2 0.3% 0.1 ms
| C3 1.4% 0.6 ms
| C4E 76.8% 73.6 ms
| CPU 2
| POLL 0.0% 0.2 ms
| C1E 1.1% 3.7 ms
| C2 0.2% 0.2 ms
| C3 3.9% 1.3 ms
| C4E 93.1% 26.4 ms
| CPU 3
| POLL 0.0% 0.7 ms
| C1E 0.3% 0.3 ms
| C2 1.1% 0.4 ms
| C3 1.1% 0.5 ms
| C4E 97.0% 45.2 ms
I've finally accessed the Enhanced Core 2 C-states, and it looks like there is a measurable drop in power consumption - my meter on 8 nodes appears to be averaging at least 5% lower (with one node still running the old kernel), but I'll try swapping the kernels out again as a test.
An interesting note regarding C4E support - My Yorktown Q9550S processor appears to support it (or some other sub-state of C4), as evidenced above! This confuses me, because the Intel datasheet on the Core 2 Q9000 processor (section 6.2) only mentions C-states Normal (C0), HALT (C1 = 0x00), Extended HALT (C1E = 0x01), Stop Grant (C2 = 0x10), Extended Stop Grant (C2E = 0x11), Sleep/Deep Sleep (C3 = 0x20) and Deeper Sleep (C4 = 0x30). What is this additional 0x31 state? If I enable state C2, then C4E is used instead of C4. If I disable state C2 (force state C2E) then C4 is used instead of C4E. I suspect this may have something to do with the MWAIT flags, but I haven't yet found documentation for this behavior.
I'm not certain what to make of this: The C1E state appears to be used in lieu of C1, C2 is used in lieu of C2E and C4E is used in lieu of C4. I'm uncertain if C1/C1E, C2/C2E and C4/C4E can be used together with intel_idle
or if they are redundant. I found a note in this 2010 presentation by Intel Labs Pittsburgh that indicates the transitions are C0 - C1 - C0 - C1E - C0, and further states:
C1E is only used when all the cores are in C1E
I believe that is to be interpreted as the C1E state is entered on other components (e.g. memory) only when all cores are in the C1E state. I also take this to apply equivalently to the C2/C2E and C4/C4E states (Although C4E is referred to as "C4E/C5" so I'm uncertain if C4E is a sub-state of C4 or if C5 is a sub-state of C4E. Testing seems to indicate C4/C4E is correct). I can force C2E to be used by commenting out the C2 state - however, this causes the C4 state to be used instead of C4E (more work may be required here). Hopefully there aren't any model 15 or model 23 processors that lack state C2E, because those processors would be limited to C1/C1E with the above code.
Also, the flags, latency and residency values could probably stand to be fine-tuned, but just taking educated guesses based on the Nehalem idle values seems to work fine. More reading will be required to make any improvements.
I tested this on a Core 2 Duo E2220 (Allendale), a Dual Core Pentium E5300 (Wolfdale), Core 2 Duo E7400, Core 2 Duo E8400 (Wolfdale), Core 2 Quad Q9550S (Yorkfield) and Core 2 Extreme QX9650, and I have found no issues beyond the afore-mentioned preference for state C2/C2E and C4/C4E.
Not covered by this driver modification:
- The original Core Solo/Core Duo (Yonah, non Core 2) are family 6, model 14. This is good because they supported the C4E/C5 (Enhanced Deep Sleep) C-states but not the C1E/C2E states and would need their own idle definition.
The only issues that I can think of are:
- Core 2 Solo SU3300/SU3500 (Penryn-L) are family 6, model 23 and will be detected by this driver. However, they are not Socket LGA775 so they may not support the C1E Enhanced Halt C-state. Likewise for the Core 2 Solo ULV U2100/U2200 (Merom-L). However, the
intel_idle
driver appears to choose the appropriate C1/C1E based on hardware support of the sub-states. - Core 2 Extreme QX9650 (Yorkfield) reportedly does not support C-state C2 or C4. I have confirmed this by purchasing a used Optiplex 780 and QX9650 Extreme processor on eBay. The processor supports C-states C1 and C1E. With this driver modification, the CPU idles in state C1E instead of C1, so there is presumably some power savings. I expected to see C-state C3, but it is not present when using this driver so I may need to look into this further.
I managed to find a slide from a 2009 Intel presentation on the transitions between C-states (i.e., Deep Power Down):
In conclusion, it turns out that there was no real reason for the lack of Core 2 support in the intel_idle
driver. It is clear now that the original stub code for "Core 2 Duo" only handled C-states C1 and C2, which would have been far less efficient than the acpi_idle
function which also handles C-state C3. Once I knew where to look, implementing support was easy. The helpful comments and other answers were much appreciated, and if Amazon is listening, you know where to send the check.
This update has been committed to github. I will e-mail a patch to the LKML soon.
Update: I also managed to dig up a Socket T/LGA775 Allendale (Conroe) Core 2 Duo E2220, which is family 6, model 15, so I added support for that as well. This model lacks support for C-state C4, but supports C1/C1E and C2/C2E. This should also work for other Conroe-based chips (E4xxx/E6xxx) and possibly all Kentsfield and Merom (non Merom-L) processors.
Update: I finally found some MWAIT tuning resources. This Power vs. Performance writeup and this Deeper C states and increased latency blog post both contain some useful information on identifying CPU idle latencies. Unfortunately, this only reports those exit latencies that were coded into the kernel (but, interestingly, only those hardware states supported by the processor):
# cd /sys/devices/system/cpu/cpu0/cpuidle
# for state in `ls -d state*` ; do echo c-$state `cat $state/name` `cat $state/latency` ; done
c-state0/ POLL 0
c-state1/ C1 3
c-state2/ C1E 10
c-state3/ C2 20
c-state4/ C2E 40
c-state5/ C3 20
c-state6/ C4 60
c-state7/ C4E 100
Update: An Intel employee recently published an article on intel_idle
detailing MWAIT states.
Is there a more appropriate way to configure a kernel for optimal CPU idle support for this family of processors (aside from disabling support for intel_idle)
You have ACPI enabled, and you've checked that acpi_idle is in use. I sincerely doubt you have missed any helpful kernel config option. You can always check powertop
for possible suggestions, but probably you already knew that.
This is not an answer, but I want to format it :-(.
Looking at the kernel source code, the current intel_idle driver contains a test to specifically exclude Intel family 6 from the driver.
No it doesn't :-).
id = x86_match_cpu(intel_idle_ids);
if (!id) {
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
boot_cpu_data.x86 == 6)
pr_debug(PREFIX "does not run on family %d model %d\n",
boot_cpu_data.x86, boot_cpu_data.x86_model);
return -ENODEV;
}
The if
statement does not exclude Family 6. Instead, the if
statement provides a message when debugging is enabled, that this specific modern Intel CPU is not supported by intel_idle
. In fact, my current i5-5300U CPU is Family 6 and it uses intel_idle
.
What excludes your CPU is that there is no match in the intel_idle_ids
table.
I noticed this commit which implemented the table. The code it removes had a switch
statement instead. This makes it easy to see that the earliest model intel_idle has been implemented/successfully tested/whatever is 0x1A = 26. https://github.com/torvalds/linux/commit/b66b8b9a4a79087dde1b358a016e5c8739ccf186
I suspect this could just be a case of opportunity and cost. When intel_idle
was added, it seems Core 2 Duo support was planned, but it never was fully implemented — perhaps by the time the Intel engineers got round to it, it wasn’t worth it any more. The equation is relatively complex: intel_idle
needs to provide sufficient benefits over acpi_idle
to make it worth supporting here, on CPUs which will see the “improved” kernel in sufficient numbers...
As sourcejedi’s answer says, the driver doesn’t exclude all of family 6. The intel_idle
initialisation checks for CPUs in a list of CPU models, covering basically all micro-architectures from Nehalem to Kaby Lake. Yorkfield is older than that (and significantly different — Nehalem is very different from the architectures which came before it). The family 6 test only affects whether the error message is printed; its effect is only that the error message will only be displayed on Intel CPUs, not AMD CPUs (Intel family 6 includes all non-NetBurst Intel CPUs since the Pentium Pro).
To answer your configuration question, you could completely disable intel_idle
, but leaving it in is fine too (as long as you don’t mind the warning).