Why use a named pipe instead of a file?

Almost everything in Linux can be considered a file, but the main difference between a regular file and a named pipe is that a named pipe is a special instance of a file that has no contents on the filesystem.

Here is quote from man fifo:

A FIFO special file (a named pipe) is similar to a pipe, except that it is accessed as part of the filesystem. It can be opened by multiple processes for reading or writing. When processes are exchanging data via the FIFO, the kernel passes all data internally without writing it to the filesystem. Thus, the FIFO special file has no contents on the filesystem; the filesystem entry merely serves as a reference point so that processes can access the pipe using a name in the filesystem.

The kernel maintains exactly one pipe object for each FIFO special file that is opened by at least one process. The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.

So actually a named pipe does nothing until some process reads and writes to it. It does not take any space on the hard disk (except a little bit of meta information), it does not use the CPU.

You can check it by doing this:

Create a named pipe

$ mkfifo /tmp/testpipe

Go to some directory, for example /home/user/Documents, and gzip everything inside it, using named pipe.

$ cd /home/user/Documents
$ tar cvf - . | gzip > /tmp/testpipe &
[1] 28584

Here you should see the PID of the gzip process. In our example it was 28584.

Now check what this PID is doing

$ ps u -P 28584
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
c0rp     28584  0.0  0.0  29276  7800 pts/8    S    00:08   0:00 bash

You will see that it is using no resources. 0% CPU usage, 0% memory usage.

Verify hunch regarding file space usage

$ du -h /tmp/testpipe
0   testpipe

And again 0, nothing. The testpipe could be used again if needed.

Don't forget to kill gzip, using kill -15 28584. And remove our named pipe using rm /tmp/testpipe

Example Usages

You can redirect almost everything using named pipe. As example you can see this one line proxy.

Also here is one more nice explanation of named pipe usage. You can configure two processes on one server to communicate using a named pipe instead of TCP/IP stack. It is much faster, and does not load network resources. For example your Web Server can communicate with the database directly using a named pipe, instead of using localhost address or listening to some port.

It is true that you won't use system memory but the fact you don't use cpu in your example is only because you don't read the pipe so the process is waiting.

Consider following example:

mkfifo /tmp/testpipe
tar cvf - / | gzip > /tmp/testpipe

Now open a new console and run:

watch -n 1 'ps u -P $(pidof tar)

And in a third console:

cat /tmp/testpipe > /dev/null

If you look at the watch cmd (2nd term) it will show an increase in cpu consumption !

Here is a use case where named pipes can save you a lot of time by removing I/O.

Let's suppose you have a BigFile, for example 10G.

You also have splits of this BigFile in pieces of 1G, BigFileSplit_01 to BigFile_Split_10.

Now you have a doubt on the correctness of BigFileSplit_05

Naively, without named pipes, you would create a new split from BigFile and compare:

dd if=BigFile of=BigFileSplitOrig_05 bs=1G skip=4 count=1
diff -s BigFileSplitOrig_05 BigFileSplit_05
rm BigFileSplitOrig_05

With named pipes you would do

mkfifo BigFileSplitOrig_05
dd if=BigFile of=BigFileSplitOrig_05 bs=1G skip=4 count=1 &
diff -s BigFileSplitOrig_05 BigFileSplit_05
rm BigFileSplitOrig_05

That may not seem at first sight a big difference... but in time the difference is huge!

Option 1:

dd: read 1G / write 1G (1)
diff: read 2G
rm: free allocated clusters / remove directory entry

Option 2:

dd: nothing! (goes to named pipe)
diff: read 2G
rm: no allocated cluster to manage (we didn't actually write anything to the filesystem) / remove directory entry

So basically the named pipe saves you here a read and write of 1G plus some filesystem cleaning (since we wrote nothing to the filesystem but the empty fifo node).

Not doing I/O, especially writes, is also good to avoid the wear of your disks. It is even more interesting when you work with SSDs since they have a limited number of writes before cells die.

(1) Obviously, another option would be to create that temporary file to RAM, for example if /tmp is mounted to RAM (tmpfs). Nevertheless you would be limited by the size of the RAM disk, whereas the "named pipe trick" has no limits.

Why use a named pipe instead of a file?

Tags:

Pipe

Files

Related

Recent Posts