Tuesday, October 30, 2012

Exploring EIST

My desktop PC has a Gigabyte GA-P35-DS3R motherboard and Core 2 Quad Q6600 stepping 11 CPU. The BIOS is the latest official release, F13.

The latency issue

On this and some other Gigabyte motherboards, enabling "CPU EIST Function" in "Advanced BIOS Features" causes audio interruptions in Windows 7. This is due to latency spikes which can be seen using the DPC Latency Checker (dpclat). They can be investigated and tracked down using xperf. The rest of this document is not specifically focused on this issue; it only served as the initial motivation.

About C1E

When the option is disabled, I can still see frequency and voltage changing between the two different P-states supported by the CPU. This is probably only due to the enhanced C1 (C1E) state, which is enabled via a separate "CPU Enhanced Halt (C1E)" BIOS option. It means that when all the cores are halted, the CPU goes to its lowest P-state. This is automatic, so there is no need for the OS to do anything special. It can be disabled in Windows by disabling the C1E option in RealTemp. I don't know how that works. The Intel® 64 and IA-32 Architectures Software Developer Manual documents MSR_POWER_CTL with a C1E Enable bit, but that MSR does not exist in the Q6600.

SpeedStep in Linux

In Ubuntu 12.10 with kernel 3.5.0-17-generic, it is obvious that there is no CPU frequency control when EIST is disabled in the BIOS. The /sys/devices/system/cpu/cpu?/cpufreq directories don't exist, and everything shows that the CPU is stuck at 2.4 GHz. However, it's still possible to see the voltage vary using sensors because of C1E.

If EIST is enabled in the BIOS, frequency control is enabled in the kernel. There is an "ACPI: Requesting acpi_cpufreq" message at startup, and the cpufreq directories appear. It's possible to take manual control of the frequency by changing the governor.

for i in /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor ; do sudo sh -c "echo userspace > $i" ; done

It then becomes possible to manually set the frequency. For example, this shows how to switch between the two states supported by my Q6600:

for i in /sys/devices/system/cpu/cpu?/cpufreq/scaling_setspeed ; do sudo sh -c "echo 2400000 > $i" ; done
for i in /sys/devices/system/cpu/cpu?/cpufreq/scaling_setspeed ; do sudo sh -c "echo 1600000 > $i" ; done

Note how this sets all of the cores. The setting is for each individual core, but the actual change is for the whole CPU. The CPU will choose the highest selected frequency and voltage out of all the cores. So, to choose 1.6 GHz, all cores need to be set to that. If even one core is set to 2.4 GHz, all cores will run at that speed.

This undoubtedly works. The performance difference is easy to measure. It's also possible to use sensors to see how at 1.6 GHz, the voltage stays low even at full load. To go back to automatic control, just select the ondemand governor:

for i in /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor ; do sudo sh -c "echo ondemand > $i" ; done

Note how when the ondemand governor is selected, all the scaling_setspeed files contain <unsupported> instead of a frequency. Other options are also available.

The MSRs

EIST can also be controlled and observed via model-specific registers in the CPU. It's all documented in the Intel Software Developer Manual. The frequency and voltage can be set via IA32_PERF_CTL and the current state can be read via IA32_PERF_STATUS.

For accessing MSRs in Linux, msrtool is useful. The version available in Ubuntu is old, so it needs to be compiled from source. If you get an error about pci.h, install libpci-dev. For accessing MSRs via /dev/cpu/?/msr, the msr module must be loaded.

For controlling EIST, c2ctl is a more convenient tool. It can set multiple cores with one command, and it can even enable EIST via IA32_MISC_ENABLE. As a result, it can be used to change the CPU speed even when EIST is disabled in the BIOS and kernel. It can even be used to set frequencies and voltages between the two officially supported states. However, it cannot set anything outside this range, because the CPU is locked.

The role of ACPI

The Linux kernel does not actually know how to use EIST. Instead, it is just told by the BIOS via ACPI that certain states can be set by writing certain values to certain addresses. This is why the driver is called acpi_cpufreq. The ACPI tables provided by the BIOS can be dumped, extracted and disassembled:

sudo acpidump > acpidump.out
acpixtract acpidump.out

iasl -d DSDT.dat
iasl -d SSDT.dat

This results in DSDT.dsl and SSDT.dsl files containing ACPI Source Language, which is a programming language and not a simple description. Information is available in the ACPI specification and elsewhere.

My BIOS provides the same DSDT table regardless of whether EIST is enabled. However, the SSDT table is only provided when EIST is enabled. A quick look inside the SSDT.dsl shows code relating to EIST, but it's not all there. The start of the file contains an SSDT array containing memory addresses, and the rest of the file contains code which loads more code from these addresses. The kernel shows ACPI: Dynamic OEM Table Load messages relating to this during startup. It's possible to use those messages to dump the tables:

dmesg | sed -n "s/^.*SSDT 0000\([^ ]*\) \([^ ]*\) .*PmRef *\(Cpu[0-9][^ ]*\) .*$/sudo acpidump -a 0x\1 -l 0x\2 > ssdt_\3.aml/p" | sh
for i in ssdt_*.aml ; do iasl -d "$i" ; done 

Now all the code is available for examination. The ssdt_Cpu?Ist files contain power management functions for each particular core. The _PCT method shows where to write to change P-states and where to read the current state.  The _PSS method gives a list of the available P-states, including information about them and the value to write to the location given by _PCT. For more information about what's going on here, check out this message board thread and the ACPI specification. What the code needs to do is actually very simple. It's just complicated by the need to support various different configurations.

In my case, there are no tables for C-states. This is because the Q6600 CPU only supports C1E, and the OS does not need to do anything special to use it.  Deeper sleep such as C2 requires a read from a chipset I/O port, and those ports would be described in a table.

It is possible to integrate all the SSDT code into the DSDT file. The files cannot simply be concatenated and some editing is needed. The code inside the DefinitionBlock in other files can go into the DSDT DefinitionBlock. Also, the TLD0 variable and the whole "If (LEqual (TLD0, 0x00))" block can be removed, because once the code is integrated there is no need to load it externally. This means the code will be available even if the OS does not show a need for it when calling the _PDC method, but that should not be a problem.

When compiling the DSDT, there can be errors due to bad code from the motherboard manufacturer. Gigabyte chose to use the Microsoft compiler which allows errors that the Intel compiler does not. I found a message board post showing the fixes needed for my motherboard.

The best way to load the custom DSDT is via grub. Recompiling should be a last resort, because it needs to be repeated every time when upgrading the kernel.

When EIST is disabled in the BIOS, the custom DSDT is sufficient to make Linux load the acpi_cpufreq driver and pretend that it can be used. However, the attempted CPU frequency changes have no effect. One obvious issue is that EIST is disabled via IA32_MISC_ENABLE. That can be fixed via c2ctl, but it is not sufficient to EIST work.

What's at ports 0x880 and 0x882?

The DSDT tells the operating system that performance states can be set by writing to port 0x880, and that the current state can be read from port 0x882. That is weird. I don't know what hardware resides at that location. Maybe it's not real hardware, and it instead runs System Management Mode code from the BIOS. This could explain the latency spikes I observed in Windows. Maybe that code takes a long time to run. I may investigate those ports more some other day. (Update: Yes, these ports trigger SMI and lead to SMM.)

It's possible to write to the MSRs instead. The _PCT method needs to be changed to return MSR addresses using FFixedHW:

Method (_PCT, 0, NotSerialized)
    Return (Package (0x02)
        ResourceTemplate ()
            Register (FFixedHW,         // PERF_CTL
                0x10,                   // Bit Width
                0x00,                   // Bit Offset
                0x00000199              // Address

        ResourceTemplate ()
            Register (FFixedHW,         // PERF_STATUS
                0x10,                   // Bit Width
                0x00,                   // Bit Offset
                0x00000198,             // Address

Also, it is important to make sure that the _PSS method returns actual MSR values consisting of the FID and VID instead of some other value. With these changes, EIST works after it is enabled via "sudo c2ctl 0-3 -e". The ACPI language doesn't seem to have the ability to access an MSR, I cannot simply enable EIST via some code in the DSDT.

Perhaps Windows would not have DPC latency spikes if it used MSRs instead of I/O ports? I attempted use my altered DSDT with Windows 7 by booting via Grub, but that resulted in an ACPI blue screen of death. Maybe Grub can't successfully change the DSDT for Windows?

The Performance Penalty

EIST has a cost. The CPU will increase speed when needed, but this is not instantaneous. A long process will effectively run at the maximum CPU speed, but shorter processes won't. Here is a totally artificial example:

time for (( i=0 ; i<1000 ; i++ )) ; do cat /dev/null ; done

Here, the ondemand governor is over 20% slower than when speed is manually set to 2.4 GHz. It's unlikely that the speed of running 1000 processes that do nothing matters, but it's possible that things like configure scripts are slowed down.

I guess I will continue as before, just using C1E and not using EIST. I don't really notice a difference in fan RPM and temperature, and I'd rather not have any performance penalty. I doubt that there's a big difference in power usage, because if the CPU runs at a higher speed, that means it gets work done quicker and spends more time in C1E. I nevertheless think this exploration was worthwhile, because it has demystified ACPI for me.

No comments: