Which file compression software for linux offers the highest size reduction?
lrzip is what you're really looking for, especially if you're compressing source code!
Quoting the README:
This is a compression program optimised for large files. The larger the file and the more memory you have, the better the compression advantage this will provide, especially once the files are larger than 100MB. The advantage can be chosen to be either size (much smaller than bzip2) or speed (much faster than bzip2). [...]The unique feature of lrzip is that it tries to make the most of the available ram in your system at all times for maximum benefit.
lrzip works by first scanning for and removing any long-distance data redundancy with an rzip-based algorithm, then compressing the non-redundant data.
Con Kolivas provides a fantastic example in the Linux Kernel Mailing List; wherein he compresses a 10.3GB tarball of forty Linux Kernel releases down to 163.9MB (1.6%), and does so faster than xz. He wasn't even using the most aggressive second-pass algorithm!
I'm sure you'll have great results compressing massive tarballs of source code :)
sudo apt-get install lrzip
Example (using default for others options):
Ultra compression, dog slow:
lrzip -z file
For folders, just change lrzip
for lrztar
7zip
is more a compactor (like PKZIP) than a compressor. It's available for Linux, but it can only create compressed archives in regular files, it's not able to compress a stream for instance. It's not able to store most of Unix file attributes like ownership, ACLs, extended attributes, hard links...
On Linux, as a compressor, you've got xz
that uses the same compression algorithm as 7zip
(LZMA2). You can use it to compress tar archives.
Like for gzip
and bzip2
, there's a parallel variant pixz
that can leverage several processors to speed up the compression (xz
can also do it natively since version 5.2.0 with the -T
option). The pixz
variant also supports indexing a compressed tar
archive which means it's able to extract a single file without having to uncompress the file from the start.
If you're looking for greatest size reduction regardless of compression speed, LZMA
is likely your best option.
When comparing the various compressions, generally the tradeoff is time vs. size. gzip
tends to compress and decompress relatively quickly while yielding a good compression ratio. bzip2
is somewhat slower than gzip
both in compression and decompression time, but yields even greater compression ratios. LZMA
has the longest compression time but yields the best ratios while also having a decompression rate outperforming that of bzip2
.
Sources: http://bashitout.com/2009/08/30/Linux-Compression-Comparison-GZIP-vs-BZIP2-vs-LZMA-vs-ZIP-vs-Compress.html
http://tukaani.org/lzma/benchmarks.html