Creating temp file vs process substitution vs variable expansion?

<(cmd) is a ksh feature also found nowadays in zsh and bash called process substitution.

On systems that support /dev/fd/n or /proc/self/fd/n, it's implemented with pipes and when not with temporary named pipes. In any case, it's a form of pipe that is an interprocess communication mechanism.

cmd1 <(cmd2)

Can be written (with normal pipes):

{ cmd2 4<&- | 3<&0 <&4 4<&- cmd1 /dev/fd/3; } 4<&0

Or (with named pipes):

mkfifo /tmp/named_pipe
cmd2 > /tmp/named_pipe & cmd1 /tmp/named_pipe

That is, both commands are started simultaneously and communicate with a pipe. You would usually use cmd2 | cmd1 for that, but process substitution are typically for those cases where cmd1 can only take input from a file name and not from standard input or when more than one input is needed like in diff <(cmd1) <(cmd2).

There's no rlimit affecting it other than general ones like on the number of processes, cpu time or memory.

The PIPEBUF reported by some implementations of ulimit like bash and some implementations of ksh is not an rlimit but the maximum size for which a write to a pipe is guaranteed to be atomic so is irrelevant here. The size of the pipe itself (64kB on Linux as reported by @dsmsk80) is not really a limit itself. It just says that's as much cmd2 can write to the pipe even after cmd1 has stopped reading from it.

There's a limitation though in that cmd1 may only read from that file. Because it's a pipe, it cannot write to that file or seek back and forth in the file.

zsh has a third form of command substitution using regular temporary files:

cmd1 =(cmd2)

calls cmd1 with a temporary file which contains the output of cmd2. In that case cmd1 is run after cmd2 instead of concurrently. The limit on the size of files may be reached there.

I don't know of any shell implementing a <<<(...) operator. There's however a <<< operator in zsh (inspired from the same operator in the Unix port of rc) also found in recent versions of ksh93 and bash. It's a variation on the << heredoc operator called herestring.

In:

cmd <<< something

Which is the same as the standard:

cmd << EOF
something
EOF

The shell creates a temporary file with something\n as the content and feeds that as standard input to a new process, unlinks that file and executes cmd in that new process. Again, that's a regular file so the rlimit on the maximum size of a file may be reached.

Now you can combine the <<< operator with $(...) (command substitution) to somehow emulate zsh's =(...) operator in bash and ksh93:

cmd1 <<<"$(cmd2)"

Would run cmd2 with it stdout redirected to a pipe. On the other end of the pipe, the shell reads the output of cmd2 and stores it minus the trailing newline characters and with one newline character added into a temporary file and call cmd1 with that temporary file open for reading as stdin (note there's another limitation in that it won't work if cmd2 output contains NUL characters).

To be like =(...), you'd have to write it:

cmd1 /dev/fd/3 3<<<"$(cmd3)"

Note the shell has to read the whole output of cmd3 in memory before writing it to the temporary file, so in addition to the maximum file size, you may also reach the limit on memory usage.

Also note that since version 5, bash strips the write permissions to the temporary file before calling cmd1, so if you need cmd1 to be able to modify that file, you'd need to work around it with:

{
  chmod u+w /dev/fd/3 && # only needed in bash 5+
  cmd1 /dev/fd/3
} 3<<<"$(cmd3)"

Bash process substitution in the form of <(cmd) and >(cmd) is implemented with named pipes if the system supports them. The command cmd is run with its input/output connected to a pipe. When you run e.g. cat <(sleep 10; ls) you can find the created pipe under the directory /proc/pid_of_cat/fd. This named pipe is then passed as an argument to the current command (cat).

The buffer capacity of a pipe can be estimated with a tricky usage of dd command which sends zero data to the standard input of sleep command (which does nothing). Apparently, the process will sleep some time so the buffer will get full:

(dd if=/dev/zero bs=1 | sleep 999) &

Give it a second and then send USR1 signal to the dd process:

pkill -USR1 dd

This makes the process to print out I/O statistics:

65537+0 records in
65536+0 records out
65536 bytes (66 kB) copied, 8.62622 s, 7.6 kB/s

In my test case, the buffer size is 64kB (65536B).

How do you use <<<(cmd) expansion? I'm aware of it's a variation of here documents which is expanded and passed to the command on its standard input.

Hopefully, I shed some light on the question about size. Regarding speed, I'm not so sure but I would assume that both methods can deliver similar throughput.

Creating temp file vs process substitution vs variable expansion?

Tags:

Linux

Process Substitution

Variable

Related

Recent Posts