Make grown extracted tar file small again
[this answer is assuming GNU tar and GNU cp]
There is absolutely no diff and the checksum is the exact same value. Yet, one file is as twice as big as the original one.
1.1M /path/to/old/folder/subfolder/file.mcapm 2.4M /path/to/extracted/folder/subfolder/file.mcapm
That .mcapm
file is probably sparse. Use the -S
(--sparse
) tar
option when creating the archive.
Example:
$ dd if=/dev/null seek=100 of=dummy
...
$ mkdir extracted
$ tar -zcf dummy.tgz dummy
$ tar -C extracted -zxf dummy.tgz
$ du -sh dummy extracted/dummy
0 dummy
52K extracted/dummy
$ tar -S -zcf dummy.tgz dummy
$ tar -C extracted -zxf dummy.tgz
$ du -sh dummy extracted/dummy
0 dummy
0 extracted/dummy
You can also "re-sparse" a file afterwards with cp --sparse=always
:
$ dd if=/dev/zero of=junk count=100
...
$ du -sh junk
52K junk
$ cp --sparse=always junk junk.sparse && mv junk.sparse junk
$ du -sh junk
0 junk
@mosvy points out that your files were probably sparse. Re-doing the archive + extract with tar --sparse
works, or you can make existing files in the filesystem sparse again using
fallocate -d
(from util-linux) to punch holes in in-place.
for f in **/*some*pattern*;do
fallocate --dig-holes "$f"
done
The man page describes this option as
You can think of this option as doing a
cp --sparse
and then renaming the destination file to the original, without the need for extra disk space.
Linux supports the fallocate(2)
system call which allows cool stuff like this, including closing up or expanding page-sized holes in a file to shorten or grow a file, instead of just turning a range into a hole. It depends on the underlying FS to support each of the various fallocate features separately, and of course sparse files / extents in general.
It also lets you preallocate unwritten extents (like hole but with space reserved on disk), e.g. before a torrent download to avoid fragmentation. That's where the "allocate" in the name comes from.
Other kernels that util-linux can run under may support some or all of this functionality, IDK. If it doesn't work, cp --sparse
and rename should work; sparse files in general (seek instead of writing zeros) are well-established and widespread in Unix, dating back much much farther than preallocated extents, punching holes, or especially expanding or collapsing holes between existing data.