How to partially extract zipped huge plain text file?
Note that gzip
can extract zip
files (at least the first entry in the zip
file). So if there's only one huge file in that archive, you can do:
gunzip < file.zip | tail -n +3000 | head -n 20
To extract the 20 lines starting with the 3000th one for instance.
Or:
gunzip < file.zip | tail -c +3000 | head -c 20
For the same thing with bytes (assuming a head
implementation that supports -c
).
For any arbitrary member in the archive, in a Unixy way:
bsdtar xOf file.zip file-to-extract | tail... | head...
With the head
builtin of ksh93
(like when /opt/ast/bin
is ahead in $PATH
), you can also do:
.... | head -s 2999 -c 20
.... | head --skip=2999 --bytes=20
Note that in any case gzip
/bsdtar
/unzip
will always need to uncompress (and discard here) the entire section of the file that leads to the portion that you want to extract. That's down to how the compression algorithm works.
One solution using unzip -p and dd, for example to extract 10kb with 1000 blocs offset:
$ unzip -p my.zip | dd ibs=1024 count=10 skip=1000 > /tmp/out
Note: I didn't try this with really huge data...
If you have control over the creation of that big zip file, why not consider using a combination of gzip
and zless
?
This would allow you to use zless
as a pager and view the contents of the file without having to bother with extraction.
If you cannot change the compression format then this would obviously not work. If so, I feel like zless
is rather convenient.