Can GNU parallel output stdout before the program has exited?
I think you're looking for --ungroup
. The man page says:
--group Group output. Output from each jobs is grouped
together and is only printed when the command is finished.
--group is the default. Can be reversed with -u.
-u
of course is a synonym for --ungroup
.
To watch progress for a few parallel jobs, try --tmuxpane --fg
:
parallel --tmuxpane --fg seq {} 10000000 ::: {1..100}
You could also be looking for -u
or (more likely) --lb
.
From man parallel
:
--line-buffer
--lb
Buffer output on line basis. --group will keep the output together
for a whole job. --ungroup allows output to mixup with half a line
coming from one job and half a line coming from another job.
--line-buffer fits between these two: GNU parallel will print a full
line, but will allow for mixing lines of different jobs.
--line-buffer takes more CPU power than both --group and --ungroup,
but can be much faster than --group if the CPU is not the limiting
factor.
Normally --line-buffer does not buffer on disk, and can thus process
an infinite amount of data, but it will buffer on disk when combined
with: --keep-order, --results, --compress, and --files. This will
make it as slow as --group and will limit output to the available
disk space.
With --keep-order --line-buffer will output lines from the first job
while it is running, then lines from the second job while that is
running. It will buffer full lines, but jobs will not mix. Compare:
parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
See also: --group --ungroup
[...]
--ungroup
-u Ungroup output. Output is printed as soon as possible and by passes
GNU parallel internal processing. This may cause output from
different commands to be mixed thus should only be used if you do not
care about the output. Compare these:
seq 4 | parallel -j0 \
'sleep {};echo -n start{};sleep {};echo {}end'
seq 4 | parallel -u -j0 \
'sleep {};echo -n start{};sleep {};echo {}end'
It also disables --tag. GNU parallel outputs faster with -u. Compare
the speeds of these:
parallel seq ::: 300000000 >/dev/null
parallel -u seq ::: 300000000 >/dev/null
parallel --line-buffer seq ::: 300000000 >/dev/null
Can be reversed with --group.
See also: --line-buffer --group
One example where -u
shines is where stdout and stderr is mixed in the same line:
echo -n 'This is stdout (';echo -n stderr >&2 ; echo ')'
This will be formatted wrongly with --lb
and --group
.
But even -u
does not guarantee it will be formatted correctly due to the half-line mixing between processes: http://mywiki.wooledge.org/BashPitfalls#Non-atomic_writes_with_xargs_-P
My solution was to log the output into files and watch it change in real time with tail -f <file>
command, and then deleting them automatically when the job is done. I also found --progress
flag useful.
parallel --progress ./program {} '>' {}.log';' rm {}.log ::: A B C
Here jobs will consist of running program
with different inputs A
,B
,C
and sending the program's output to the corresponding log files.