Creating temp file vs process substitution vs variable expansion?
<(cmd)
is a ksh
feature also found nowadays in zsh
and bash
called process substitution.
On systems that support /dev/fd/n
or /proc/self/fd/n
, it's implemented with pipes and when not with temporary named pipes. In any case, it's a form of pipe that is an interprocess communication mechanism.
cmd1 <(cmd2)
Can be written (with normal pipes):
{ cmd2 4<&- | 3<&0 <&4 4<&- cmd1 /dev/fd/3; } 4<&0
Or (with named pipes):
mkfifo /tmp/named_pipe
cmd2 > /tmp/named_pipe & cmd1 /tmp/named_pipe
That is, both commands are started simultaneously and communicate with a pipe. You would usually use cmd2 | cmd1
for that, but process substitution are typically for those cases where cmd1
can only take input from a file name and not from standard input or when more than one input is needed like in diff <(cmd1) <(cmd2)
.
There's no rlimit affecting it other than general ones like on the number of processes, cpu time or memory.
The PIPEBUF reported by some implementations of ulimit
like bash
and some implementations of ksh
is not an rlimit but the maximum size for which a write to a pipe is guaranteed to be atomic so is irrelevant here. The size of the pipe itself (64kB on Linux as reported by @dsmsk80) is not really a limit itself. It just says that's as much cmd2
can write to the pipe even after cmd1
has stopped reading from it.
There's a limitation though in that cmd1
may only read from that file. Because it's a pipe, it cannot write to that file or seek back and forth in the file.
zsh
has a third form of command substitution using regular temporary files:
cmd1 =(cmd2)
calls cmd1
with a temporary file which contains the output of cmd2
. In that case cmd1
is run after cmd2 instead of concurrently. The limit on the size of files may be reached there.
I don't know of any shell implementing a <<<(...)
operator. There's however a <<<
operator in zsh
(inspired from the same operator in the Unix port of rc
) also found in recent versions of ksh93
and bash
. It's a variation on the <<
heredoc operator called herestring.
In:
cmd <<< something
Which is the same as the standard:
cmd << EOF
something
EOF
The shell creates a temporary file with something\n
as the content and feeds that as standard input to a new process, unlinks that file and executes cmd
in that new process. Again, that's a regular file so the rlimit on the maximum size of a file may be reached.
Now you can combine the <<<
operator with $(...)
(command substitution) to somehow emulate zsh
's =(...)
operator in bash
and ksh93
:
cmd1 <<<"$(cmd2)"
Would run cmd2
with it stdout redirected to a pipe. On the other end of the pipe, the shell reads the output of cmd2
and stores it minus the trailing newline characters and with one newline character added into a temporary file and call cmd1
with that temporary file open for reading as stdin (note there's another limitation in that it won't work if cmd2
output contains NUL characters).
To be like =(...)
, you'd have to write it:
cmd1 /dev/fd/3 3<<<"$(cmd3)"
Note the shell has to read the whole output of cmd3 in memory before writing it to the temporary file, so in addition to the maximum file size, you may also reach the limit on memory usage.
Also note that since version 5, bash
strips the write permissions to the temporary file before calling cmd1
, so if you need cmd1
to be able to modify that file, you'd need to work around it with:
{
chmod u+w /dev/fd/3 && # only needed in bash 5+
cmd1 /dev/fd/3
} 3<<<"$(cmd3)"
Bash process substitution in the form of <(cmd)
and >(cmd)
is implemented with named pipes if the system supports them. The command cmd
is run with its input/output connected to a pipe. When you run e.g. cat <(sleep 10; ls)
you can find the created pipe under the directory /proc/pid_of_cat/fd
. This named pipe is then passed as an argument to the current command (cat
).
The buffer capacity of a pipe can be estimated with a tricky usage of dd
command which sends zero data to the standard input of sleep
command (which does nothing). Apparently, the process will sleep some time so the buffer will get full:
(dd if=/dev/zero bs=1 | sleep 999) &
Give it a second and then send USR1
signal to the dd
process:
pkill -USR1 dd
This makes the process to print out I/O statistics:
65537+0 records in
65536+0 records out
65536 bytes (66 kB) copied, 8.62622 s, 7.6 kB/s
In my test case, the buffer size is 64kB
(65536B
).
How do you use <<<(cmd)
expansion? I'm aware of it's a variation of here documents which is expanded and passed to the command on its standard input.
Hopefully, I shed some light on the question about size. Regarding speed, I'm not so sure but I would assume that both methods can deliver similar throughput.