Estimate compressibility of file
You could try compressing one every 10 blocks for instance to get an idea:
perl -MIPC::Open2 -nE 'BEGIN{$/=\4096;open2(\*I,\*O,"gzip|wc -c")}
if ($. % 10 == 1) {print O $_; $l+=length}
END{close O; $c = <I>; say $c/$l}'
(here with 4K blocks).
Here's a (hopefully equivalent) Python version of Stephane Chazelas's solution
python -c "
import zlib
from itertools import islice
from functools import partial
import sys
with open(sys.argv[1], "rb") as f:
compressor = zlib.compressobj()
t, z = 0, 0.0
for chunk in islice(iter(partial(f.read, 4096), b''), 0, None, 10):
t += len(chunk)
z += len(compressor.compress(chunk))
z += len(compressor.flush())
print(z/t)
" file
I had a multi-gigabyte file and I wasn't sure if it was compressed, so I test-compressed the first 10M bytes:
head -c 10000000 large_file.bin | gzip | wc -c
It's not perfect but it worked well for me.