Bash: limit the number of concurrent jobs?
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
parallel gzip ::: *.log
which will run one gzip per CPU core until all logfiles are gzipped.
If it is part of a larger loop you can use sem
instead:
for i in *.log ; do
echo $i Do more stuff here
sem -j+0 gzip $i ";" echo done
done
sem --wait
It will do the same, but give you a chance to do more stuff for each file.
If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
It will download, check signature, and do a personal installation if it cannot install globally.
Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
A small bash script could help you:
# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
sleep 1
joblist=($(jobs -p))
done
$* &
If you call:
. exec-async.sh sleep 10
...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.
You need to start this script inside the current session by prefixing it with .
, because jobs
lists only the jobs of the current session.
The sleep
inside is ugly, but I didn't find a way to wait for the first job that terminates.
The following script shows a way to do this with functions. You can either put the bgxupdate()
and bgxlimit()
functions in your script, or have them in a separate file which is sourced from your script with:
. /path/to/bgx.sh
It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10
and another totally separate group with a limit of 3
).
It uses the Bash built-in jobs
to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit()
function:
- Set up an empty group variable.
- Transfer that to
bgxgrp
. - Call
bgxlimit()
with the limit and command you want to run. - Transfer the new group back to your group variable.
Of course, if you only have one group, just use bgxgrp
variable directly rather than transferring in and out.
#!/bin/bash
# bgxupdate - update active processes in a group.
# Works by transferring each process to new group
# if it is still active.
# in: bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.
bgxupdate() {
bgxoldgrp=${bgxgrp}
bgxgrp=""
((bgxcount = 0))
bgxjobs=" $(jobs -pr | tr '\n' ' ')"
for bgxpid in ${bgxoldgrp} ; do
echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
bgxgrp="${bgxgrp} ${bgxpid}"
((bgxcount++))
fi
done
}
# bgxlimit - start a sub-process with a limit.
# Loops, calling bgxupdate until there is a free
# slot to run another sub-process. Then runs it
# an updates the process group.
# in: $1 - the limit on processes.
# in: $2+ - the command to run for new process.
# in: bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes
bgxlimit() {
bgxmax=$1; shift
bgxupdate
while [[ ${bgxcount} -ge ${bgxmax} ]]; do
sleep 1
bgxupdate
done
if [[ "$1" != "-" ]]; then
$* &
bgxgrp="${bgxgrp} $!"
fi
}
# Test program, create group and run 6 sleeps with
# limit of 3.
group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done
# Wait until all others are finished.
echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
oldcount=${bgxcount}
while [[ ${oldcount} -eq ${bgxcount} ]]; do
sleep 1
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
done
echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done
Here’s a sample run, with blank lines inserted to clearly delineate different time points:
0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]
4 12:38:10 [ 5880 2524 1560 ]
5 12:38:20 [ 2524 1560 5032 ]
6 12:38:30 [ 1560 5032 5212 ]
9 12:38:50 [ 5032 5212 ]
9 12:39:10 [ 5212 ]
9 12:39:30 [ ]
- The whole thing starts at
12:38:00
(timet = 0
) and, as you can see, the first three processes run immediately. - Each process sleeps for
10n
seconds and the fourth process doesn’t start until the first exits (at timet = 10
). You can see that process3368
has disappeared from the list before1560
is added. - Similarly, the fifth process
5032
starts when5880
(the second) exits at timet = 20
. - And finally, the sixth process
5212
starts when2524
(the third) exits at timet = 30
. - Then the rundown begins, the fourth process exits at time
t = 50
(started at10
with40
duration). - The fifth exits at time
t = 70
(started at20
with50
duration). - Finally, the sixth exits at time
t = 90
(started at30
with60
duration).
Or, if you prefer it in a more graphical time-line form:
Process: 1 2 3 4 5 6
-------- - - - - - -
12:38:00 ^ ^ ^ 1/2/3 start together.
12:38:10 v | | ^ 4 starts when 1 done.
12:38:20 v | | ^ 5 starts when 2 done.
12:38:30 v | | ^ 6 starts when 3 done.
12:38:40 | | |
12:38:50 v | | 4 ends.
12:39:00 | |
12:39:10 v | 5 ends.
12:39:20 |
12:39:30 v 6 ends.
Here's the shortest way:
waitforjobs() {
while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}
Call this function before forking off any new job:
waitforjobs 10
run_another_job &
To have as many background jobs as cores on the machine, use $(nproc)
instead of a fixed number like 10.