How to "undrain" slurm nodes in drain state
Found an approach, enter scontrol interpreter (in command line type scontrol
) and then
scontrol: update NodeName=node10 State=DOWN Reason="undraining"
scontrol: update NodeName=node10 State=RESUME
Then
scontrol: show node node10
displays amongst other info
State=IDLE
Update: some of these nodes got DRAIN state back; noticed their root partition was full after e.g. show node a10
which showed Reason=SlurmdSpoolDir is full
, thus in Ubuntu sudo apt-get clean
to remove /var/cache/apt
contents and also gzipped some /var/log
files.
If no jobs are currently running on the node:
scontrol update nodename=node10 state=idle
If jobs are running on the node:
scontrol update nodename=node10 state=resume