Piping commands with very large output
When the data producer (tar
) tries to write to the pipe too quickly for the consumer (lzip
) to have time to read all of it, it will block until lzip
has had time to read what tar
is writing. There is a small buffer associated with the pipe, but its size is likely to be smaller than the size of most tar
archives. There is no risk of filling up your system's RAM with your pipeline.
"Blocking" simply means that when tar
does a call to the write()
library function (or equivalent), the call won't return until the data has been delivered to the pipe buffer, which could take a bit of time if lzip
is slow to read from that same buffer. You should be able to see this in top
where tar
would slow down and sleep a lot compared to lzip
(assuming tar
is in fact quicker than lzip
).
You would therefore not fill up a significant amount of RAM with your pipeline. To do that (if you wanted to), you could use something like pv
in the middle, with some large buffer (here, a gigabyte):
tar -cvf - /tmp/source-dir | pv --buffer-size 1G | lzip -o /media/my-usb/result.lz -
This would still block tar
whenever pv
blocks. pv
would block when its buffer is full and it can't write to lzip
.
The reverse situation works in a similar way, i.e. if you have a slow left-hand side of a pipe writing to a fast right-hand side, the consumer on the right would block on read()
until there is data to be read from the pipe.
This (data I/O) is the only thing that synchronises the processes taking part in a pipeline. Apart from reading and writing (and occasionally blocking while waiting for someone else to read or write), they would run independently of each other.
GNU tar has --lzip option to "filter the archive through lzip", so you may want to use instead:
tar -cvf --lzip /media/my-usb/result.lz /tmp/source-dir
Answering the question: in your case the system will manage the pipe properly using default system buffer size.