Unarchive file while reducing size of archive?

Here is another solution. It won't let you extract individual files from an archive and reduce its size, but it does let you extract all files reducing the size of the archive as you go:

#!/bin/sh

# $1, the first paramter, is the .tar.gz file to unarchive

(
    size=$(wc -c $1)
    offset=0
    bs=4096
    while [[ $size > $offset ]]; do
        dd if=$1 bs=$bs count=1 skip=$offset status=none
        fallocate -p -o $offset -l $bs $1
        offset=$(( $offset + $bs ))
    done
) | tar xz

Save this into a file like e.g. untar_and_destroy.sh and execute as:

untar_and_destroy.sh whatever.tar.gz

What this does is give part of the .tar.gz file to tar, asks Linux to deallocate that part of the file, and then repeats for the next part. When you are done, ls -l will say the .tar.gz files has the same size as before, but du will report its size as 0. This is because the .tar.gz has been made into a sparse file, with the same length as before but as all 0s that don't need to be stored onto disk.

Don't use this in production, or anywhere where having that archive deleted would be bad. This makes the archive unreadable as soon as it starts, so if anything goes wrong, e.g. you run out of hard disk space while extracting, you won't get a second chance to run this.


Although it may be impractical to expand your primary storage, perhaps you could extract the file contents to an external storage device.

Alternatively, generate a list of files in the archive, then write a script which extracts some of those. Move those files to the cloud, select another batch to extract, lather, rinse, repeat.

But, every archival app I know of has to have the original archive file intact while it creates a new archive file without what you don't want, so external storage is going to be very, very useful.


I don't know of any tools that can do this, and I don't think any of the common archiving formats support this.

One possible solution to your problem would be to keep the archive on a different machine and pipe it over to the machine you want to decompress it onto. For example, you could run this command on the machine with the archive:

cat archive.tar.gz | ssh YOUR_SERVER tar xfz -

The archive will be streamed to the tar process running on the server, which will decompress it without needing the archive to ever be present on the server.