SLURM sacct shows 'batch' and 'extern' job names
A Slurm job contains multiple jobsteps, which are all accounted for (in terms of resource usage) separately by Slurm. Usually, these steps are created using srun/mpirun and enumerated starting from 0. But in addition to that, there are sometimes two special steps. For example, take the following job:
sbatch -n 4 --wrap="srun hostname; srun echo Hello World"
This resulted in the following sacct output:
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
5163571 wrap medium admin 4 COMPLETED 0:0
5163571.bat+ batch admin 4 COMPLETED 0:0
5163571.ext+ extern admin 4 COMPLETED 0:0
5163571.0 hostname admin 4 COMPLETED 0:0
5163571.1 echo admin 4 COMPLETED 0:0
The two srun
calls created the steps 5163571.0
and 5163571.1
. The 5163571.bat+
accounts for the ressources needed by the batch script (which in this case is just srun hostname; srun echo Hello World
. --wrap
just puts that into a file and adds #!/bin/sh
).
Many non-MPI programs do a lot of calculations in the batch step, so the ressource usage is accoutned there.
And now for 5163571.ext+
: This step accounts for all resources usage by that job outside of slurm. This only shows up, if the PrologFlag contain
is used.
An example of a process belonging to a slurm job, but not directly controlled by slurm are ssh sessions. If you ssh into a node where one of your jobs runs, your session will be placed into the context of the job (and you will be limited to your available resources by cgroups, if that is set up). And all calculations you do in that ssh session will be accounted for in the .extern job step.