Perl6: large gzipped files read line by line
Update Now tested, which revealed an error, now fixed.
Solution #2
use Compress::Zlib;
my $file = "data.txt.gz" ;
my $handle = try open $file or die "Error reading $file: $!" ;
my $zwrap = zwrap($handle, :gzip) ;
for $zwrap.lines {
.print
}
CATCH { default { die "Error reading $file: $_" } }
$handle.close ;
I've tested this with a small gzipped text file.
I don't know much about gzip etc. but figured this out based on:
Knowing P6;
Reading
Compress::Zlib
'sREADME
and choosing thezwrap
routine;Looking at the module's source code, in particular the signature of the
zwrap
routineour sub zwrap ($thing, :$zlib, :$deflate, :$gzip)
;And trial and error, mainly to guess that I needed to pass the
:gzip
adverb.
Please comment on whether my code works for you. I'm guessing the main thing is whether it's fast enough for the large files you have.
A failed attempt at solution #5
With solution #2 working I would have expected to be able to write just:
use Compress::Zlib ;
.print for "data.txt.gz".&zwrap(:gzip).lines ;
But that fails with:
No such method 'eof' for invocant of type 'IO::Path'
This is presumably because this module was written before the reorganization of the IO
classes.
That led me to @MattOates' IO::Handle
like object with .lines
? issue. I note no response and I saw no related repo at https://github.com/MattOates?tab=repositories.
I am focusing on the Inline::Perl5
solution that you tried.
For the call to $gz.gzreadline($_)
: it seems like gzreadline
tries to return the line read from the zip file by modifying its input argument $_
(treated as an output argument, but it is not a true Perl 5 reference variable[1]), but the modified value is not returned to the Perl 6 script.
Here is a possoble workaround: Create a wrapper module in the curent directory, e.g. ./MyZlibWrapper.pm:
package MyZlibWrapper;
use strict;
use warnings;
use Compress::Zlib ();
use Exporter qw(import);
our @EXPORT = qw(gzopen);
our $VERSION = 0.01;
sub gzopen {
my ( $fn, $mode ) = @_;
my $gz = Compress::Zlib::gzopen( $fn, $mode );
my $self = {gz => $gz};
return bless $self, __PACKAGE__;
}
sub gzreadline {
my ( $self ) = @_;
my $line = "";
my $res = $self->{gz}->gzreadline($line);
return [$res, $line];
}
sub gzclose {
my ( $self ) = @_;
$self->{gz}->gzclose();
}
1;
Then use Inline::Perl5
on this wrapper module instead of Compress::Zlib
. For example ./p.p6:
use v6;
use lib:from<Perl5> '.';
use MyZlibWrapper:from<Perl5>;
my $file = 'data.txt.gz';
my $mode = 'rb';
my $gz = gzopen($file, $mode);
loop {
my ($res, $line) = $gz.gzreadline();
last if $res == 0;
print $line;
}
$gz.gzclose();
[1]
In Perl 5 you can modify an input argument that is not a reference, and the change will be reflected in the caller. This is done by modifying entries in the special @_
array variable. For example: sub quote { $_[0] = "'$_[0]'" } $str = "Hello"; quote($str)
will quote $str
even if $str
is not passed by reference.