Limit the number of running jobs in SLURM
According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs
parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.
You should be able to do something similar to:
sacctmgr modify user <userid> account=<account_name> set MaxJobs=10
I found this presentation to be very helpful in case you have more questions.
If you are not the administrator, your can hold
some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>
, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD
.
Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25
to have 100 jobs in the array but only 25 of them running.
Finally, you can use the --dependency=singleton
option that will only allow one of a set of jobs with the same --job-name
to be running at a time. If you choose three names and distribute those names to all your jobs and use that option, you are effectively restricting yourself to 3 running jobs max.