How to adjust NVIDIA GPU fan speed on a headless node?
The following is a simple method that does not require scripting, connecting fake monitors, or fiddling and can be executed over SSH to control multiple NVIDIA GPUs' fans. It has been tested on Arch Linux.
Create xorg.conf
sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7
This will create an /etc/X11/xorg.conf
with an entry for each GPU, similar to the manual method.
Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/
or /usr/share/X11/xorg.conf.d/
), which override xorg.conf
and set AllowNVIDIAGPUScreens
. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.
Alternative: Create xorg.conf manually
Identify your cards' PCI IDs:
nvidia-xconfig --query-gpu-info
Find the PCI BusID
fields. Note that these are not the same as the bus IDs reported in the kernel.
Alternatively, do sudo startx
, open /var/log/Xorg.0.log
(or whatever location startX lists in its output under the line "Log file:"), and look for the line NVIDIA(0): Valid display device(s) on GPU-<GPU number> at PCI:<PCI ID>
.
Edit /etc/X11/xorg.conf
Here is an example of xorg.conf
for a three-GPU machine:
Section "ServerLayout"
Identifier "dual"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
Screen 1 "Screen2" RightOf "Screen1"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:5:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:6:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Device"
Identifier "Device2"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:9:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
EndSection
Section "Screen"
Identifier "Screen1"
Device "Device1"
EndSection
Section "Screen"
Identifier "Screen2"
Device "Device2"
EndSection
The BusID
must match the bus IDs we identified in the previous step. The option AllowEmptyInitialConfiguration
allows X to start even if no monitor is connected. The option Coolbits
allows fans to be controlled. It can also allow overclocking.
Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/
or /usr/share/X11/xorg.conf.d/
), which override xorg.conf
and set AllowNVIDIAGPUScreens
. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.
Edit /root/.xinitrc
nvidia-settings -q fans
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:2]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=75
I use .xinitrc to execute nvidia-settings for convenience, although there's probably other ways. The first line will print out every GPU fan in the system. Here, I set the fans to 75%.
Launch X
sudo startx -- :0
You can execute this command from SSH. The output will be:
Current version of pixman: 0.34.0
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat May 27 02:22:08 2017
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
Attribute 'GPUFanControlState' (pushistik:0[gpu:0]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:0]) assigned value 75.
Attribute 'GPUFanControlState' (pushistik:0[gpu:1]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:1]) assigned value 75.
Attribute 'GPUFanControlState' (pushistik:0[gpu:2]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:2]) assigned value 75.
Monitor temperatures and clock speeds
nvidia-smi
and nvtop
can be used to observe temperatures and power draw. Lower temperatures will allow the card to clock higher and increase its power draw. You can use sudo nvidia-smi -pl 150
to limit power draw and keep the cards cool, or use sudo nvidia-smi -pl 300
to let them overclock. My 1080 Ti runs at 1480 MHz if given 150W, and over 1800 MHz if given 300W, but this depends on the workload. You can monitor their clock speed with nvidia-smi -q
or more specifically, watch 'nvidia-smi -q | grep -E "Utilization| Graphics|Power Draw"'
Returning to automatic fan management.
Reboot. I haven't found another way to make the fans automatic.
I've written a pip-installable Python script to do something similar to @AlexsandrDubinsky's suggestion.
When you run fans.py, it sets up a temporary X server for each GPU with a fake display attached. Then, it loops over the GPUs every few seconds and sets the fan speed according to their temperature. When the script dies, it returns control of the fans to the drivers and cleans up the X servers.