Does Perl's Glob have a limitation?
The glob
first creates all possible file name expansions, so it will first generate the complete list from the shell-style glob/pattern it is given. Only then will it iterate over it, if used in scalar context. That's why it's so hard (impossible?) to escape the iterator without exhausting it; see this post.
In your first example that's 265 strings (11_881_376
), each five chars long. So a list of ~12 million strings, with (naive) total in excess of 56Mb ... plus the overhead for a scalar, which I think at minimum is 12 bytes or such. So at the order of a 100Mb's, at the very least, right there in one list.†
I am not aware of any formal limits on lengths of things in Perl (other than in regex) but glob
does all that internally and there must be undocumented limits -- perhaps some buffers are overrun somewhere, internally? It is a bit excessive.
As for a way around this -- generate that list of 5-char strings iteratively, instead of letting glob
roll its magic behind the scenes. Then it absolutely should not have a problem.
However, I find the whole thing a bit big for comfort, even in that case. I'd really recommend to write an algorithm that generates and provides one list element at a time (an "iterator"), and work with that.
There are good libraries that can do that (and a lot more), some of which are Algorithm::Loops recommended in a previous post on this matter (and in a comment), Algorithm::Combinatorics (same comment), Set::CrossProduct
from another answer here ...
Also note that, while this is a clever use of glob
, the library is meant to work with files. Apart from misusing it in principle, I think that it will check each of (the ~ 12 million) names for a valid entry! (See this page.) That's a lot of unneeded disk work. (And if you were to use "globs" like *
or ?
on some systems it returns a list with only strings that actually have files, so you'd quietly get different results.)
† I'm getting 56 bytes for a size of a 5-char scalar. While that is for a declared variable, which may take a little more than an anonymous scalar, in the test program with length-4 strings the actual total size is indeed a good order of magnitude larger than the naively computed one. So the real thing may well be on the order of 1Gb, in one operation.
Update A simple test program that generates that list of 5-char long strings (using the same glob
approach) ran for 15-ish minutes on a server-class machine and took 725 Mb of memory.
It did produce the right number of actual 5-char long strings, seemingly correct, on this server.
Everything has some limitation.
Here's a pure Perl module that can do it for you iteratively. It doesn't generate the entire list at once and you start to get results immediately:
use v5.10;
use Set::CrossProduct;
my $set = Set::CrossProduct->new( [ ([ 'a'..'z' ]) x 5 ] );
while( my $item = $set->get ) {
say join '', @$item
}