How to partially extract zipped huge plain text file?

Note that gzip can extract zip files (at least the first entry in the zip file). So if there's only one huge file in that archive, you can do:

Click to copy

gunzip < file.zip | tail -n +3000 | head -n 20

To extract the 20 lines starting with the 3000th one for instance.

Or:

Click to copy

gunzip < file.zip | tail -c +3000 | head -c 20

For the same thing with bytes (assuming a head implementation that supports -c).

For any arbitrary member in the archive, in a Unixy way:

Click to copy

bsdtar xOf file.zip file-to-extract | tail... | head...

With the head builtin of ksh93 (like when /opt/ast/bin is ahead in $PATH), you can also do:

Click to copy

.... | head     -s 2999      -c 20
.... | head --skip=2999 --bytes=20

Note that in any case gzip/bsdtar/unzip will always need to uncompress (and discard here) the entire section of the file that leads to the portion that you want to extract. That's down to how the compression algorithm works.

One solution using unzip -p and dd, for example to extract 10kb with 1000 blocs offset:

Click to copy

$ unzip -p my.zip | dd ibs=1024 count=10 skip=1000 > /tmp/out

Note: I didn't try this with really huge data...

If you have control over the creation of that big zip file, why not consider using a combination of gzip and zless?

This would allow you to use zless as a pager and view the contents of the file without having to bother with extraction.

If you cannot change the compression format then this would obviously not work. If so, I feel like zless is rather convenient.

How to partially extract zipped huge plain text file?

Tags:

Text Processing

Zip

Related

Recent Posts