Indexed archive format?
The Zip format compresses each file separately, and then combines them (with a directory of archive contents) into a single archive file.
In addition to the already mentioned zip format, the dar
and dump
utilities also are good at handling this, and unlike zip, retain the unix permissions. For dar
you want to avoid using the solid archive option, as that goes back to the tar/gzip method of compressing the whole thing at once, which gives better compression, but makes extracting individual files take longer as the whole file must be decompressed until the desired file is found. dump
handles large sets of smallish files ( tens of thousands ) rather well, and can do multithreaded compression, but it only reads ext[234] filesystems.
pixz is a parallel, indexing version of xz.
# Compress:
tar -I pixz -cf foo.tar.xz ./foo
# Decompress:
tar -I pixz -xf foo.tar.xz
# Very quickly list the contents of the compressed tarball:
pixz -l foo.tar.xz
# Very quickly extract a single file:
pixz -x dir/file < foo.tar.xz | tar x