What is the difference between TAR vs CPIO archive file formats?
In Addition to what was said before by grawity and Paul:
History
In the "old days", cpio (with option -c
used) was the tool to use when it came to move files to other UNIX derivates since it was more portable and flexible than tar. But the tar portabilityissues may be considered as solved since the late 1980s.
Unfortunately it was about that time that different vendors mangled up the -c
format of cpio (just look at the manual page for GNU cpio and the option -H
). At that time tar became more portable than cpio ... It took almost a whole decade until the different UNIX vendors have sorted that out. Having GNU tar and GNU cpio installed was a must for all admins which had to deal with tapes from different sources back then (even nowadays I presume).
User Interface
tar may use a tape configuration file where the administrator would configure the tape drives connected to the system. The user would then just say "Well I'll take tape drive 1" instead of having to remember the exact device node for the tape (which could be very confusing and are also not standarized across different UNIX platforms.
But the main difference is:
tar is able to search directories on its own and takes the list of files or directories to be backed up from command line arguments.
cpio archives only the files or directories it is told to, but does not search subdirectories recursively on it's own. Also cpio gets the list of items to be archived from stdin - this is why it is almost always used in combination with find.
A cpio command often looks frightening to the beginner if compared with tar:
$ find myfiles -depth -print0 | cpio -ovc0 | gzip -7 > myfiles.cpio.gz
$ tar czvf myfiles.tar.gz myfiles
I think that's the main reason why most people use tar to create archive files: For simple tasks like bundling a complete directory its just easier to use.
Also GNU tar offers the option -z
which causes the archive to be compressed with GNU zip on the fly, making things even more easier.
On the other hand one may do nifty things with find & cpio. In fact it's a more UNIX-like approach: Why include directory tree search into cpio if there's already a tool that takes care of almost all one can think of: find. Things that come to mind are only backing up files newer than a certain date, restricting the files to those residing in the same filesystem or filtering the find-output with grep -v
to exclude certain files...
The people of GNU tar spent a lot of work to include a lot of those things that were previously only possible with cpio. In fact both tools learned from each other - but only cpio may read the format of tar - not the other way around.
tar and output processing
One last note to something you said:
Also I was told TAR cannot compress from STDOUT. I want to archive / compress ZFS snapshots for backups. I was wondering if I could combine CPIO with bzip2 to get this effect.
Well, every version of tar (GNU or not) may be used in a pipe. Just use a minus sign (-
) as archive name:
$ tar cvf - myfiles | bzip > myfiles.tar.bz
Also GNU tar offers the option --to-command
to specify a postprocessor command - although I'd still prefer the pipe. Maybe it's of use when writing to certain hardware devices.
Both tar
and cpio
have a single purpose: concatenate many separate files to a single stream. They don't compress data. (These days tar
is more popular due to its relative simplicity – it can take input files as arguments instead of having to be coupled with find
as cpio
has.)
In your case, you do not need either of these tools; they would have no useful effect, because you don't have many separate files. zfs send
already did the same thing that tar
would have done. So you don't have any files, only a nameless stream.
To compress the snapshot, all you have to do is pipe the zfs
output through a compression program:
zfs send media/mypictures@20070607 | gzip -c > ~/backups/20070607.gz
gzip -dc ~/backups/20070607.gz | zfs receive media/mypictures@20070607
(You can substitute gzip
with xz
or bzip2
or any other stream-compression tool, if you want.)
tar and cpio have essentially the same function, which is to create a single contiguous file from an input of multiple files and directories. Originally this was to put the result onto tape, but these days it is generally used to feed into a compression utility as you have above. This is because compressing a single large file is both more time and space efficient than compressing lots of small files. You should note that many image formats (png, jpg etc) are already highly compressed, and may actually get a bit bigger if put through a compression utility.
Neither tar or cpio do any compression themselves. Tar has effectively "won" the "what shall we use to make aggregate files" war, but cpio gets a lookin in various places. I am not aware of any benefits of one over the other, tar wins through being more commonly used.
tar can indeed take input on stdin and output to stdout - which would then be piped into bzip2 like you have or something similar. If called with the "z" option, it will automatically invoke gzip on the output.