Why does tar appear to skip file contents when output file is /dev/null?
It is a documented optimization:
When the archive is being created to
/dev/null
, GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.
This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null
; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between read
calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read
returns) and write
calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range
(not portable to other unixes at all) andsendfile
(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is/dev/null
and turns the system call into a no-opPrograms can use
mmap
to get the file contents instead ofread
, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program canmmap
the source file, then callwrite
on that chunk of mapped memory. However, as writing/dev/null
doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null
, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null
instead of > /dev/null
- and why | cat > /dev/null
should be avoided in all other cases.