How do I make an arbitrary Perl regex wholly non-capturing?
In general, you can't.
Even if you could transform all (...)
s into (?:...)
s, this would not work in the general case because the pattern might require backreferences: e.g. /(.)X\1/
, which matches any character, followed by an X
, followed by the originally matched character.
So, absent a Perl mechanism for discarding captured results "after the fact", there is no way to solve your problem for all regexes. The best you can do (or could do if you had Perl 5.10) is to use gbacon's suggestion and hope to generate a unique name for the capture buffer.
One way to protect the subpatterns you care about is to use named capture buffers:
Additionally, as of Perl 5.10.0 you may use named capture buffers and named backreferences. The notation is
(?<name>...)
to declare and\k<name>
to reference. You may also use apostrophes instead of angle brackets to delimit the name; and you may use the bracketed\g{name}
backreference syntax. It's possible to refer to a named capture buffer by absolute and relative number as well. Outside the pattern, a named capture buffer is available via the%+
hash. When different buffers within the same pattern have the same name,$+{name}
and\k<name>
refer to the leftmost defined group.
In the context of your question, check
becomes
sub check {
use 5.10.0;
my($line, $regex) = @_;
if ($line =~ /(^.*)($regex)(.*$)/) {
print "<", $+{one}, "><", $+{two}, "><", $+{three}, ">\n";
}
}
Then calling it with
my $pat = qr/(?<one>(?<two>B|(?<three>C))fo(o)?(?:D|d)?)/;
check "ABCfooDE", $pat;
outputs
<CfooD><C><C>
This does not address the general case, but your specific example can be handled with the /g
option in scalar context, which would allow you to divide the problem into two matches, the second picking up where the first left off:
sub check {
my($line, $regex) = @_;
my ($left_side, $regex_match) = ($1, $2) if $line =~ /(^.*)($regex)/g;
my $right_side = $1 if $line =~ /(.*$)/g;
print "<$left_side> <$regex_match> <$right_side>\n"; # <AB> <CfooD> <E123>
}
check( 'ABCfooDE123', qr/((B|(C))fo(o)?(?:D|d)?)/ );
If all you need is the portion of the string before and after the match, you can use the @- and @+ arrays to get the offsets into the matched string:
sub check {
my ($line, $regex) = @_;
if ($line =~ /$regex/) {
my $pre = substr $line, 0, $-[0];
my $match = substr $line, $-[0], $+[0] - $-[0];
my $post = substr $line, $+[0];
print "<$pre><$match><$post>\n";
}
}