Why does multiple use of `<( )>` token within `comb` not behave as expected?
TL;DR Multiple <(...)>
s don't mean multiple captures. Even if they did, .comb
reduces each match to a single string in the list of strings it returns. If you really want to use .comb
, one way is to go back to your original regex but also store the desired data using additional code inside the regex.
Multiple <(...)>
s don't mean multiple captures
The default start point for the overall match of a regex is the start of the regex. The default end point is the end.
Writing <(
resets the start point for the overall match to the position you insert it at. Each time you insert one and it gets applied during processing of a regex it resets the start point. Likewise )>
resets the end point. At the end of processing a regex the final settings for the start and end are applied in constructing the final overall match.
Given that your code just unconditionally resets each point three times, the last start and end resets "win".
.comb
reduces each match to a single string
foo.comb(/.../)
is equivalent to foo.match(:g, /.../)>>.Str;
.
That means you only get one string for each match against the regex.
One possible solution is to use the approach @ohmycloudy shows in their answer.
But that comes with the caveats raised by myself and @jubilatious1 in comments on their answer.
Add { @comb-result .push: |$/».Str }
to the regex
You can workaround .comb
's normal functioning. I'm not saying it's a good thing to do. Nor am I saying it's not. You asked, I'm answering, and that's it. :)
Start with your original regex that worked with your other solutions.
Then add { @comb-result .push: |$/».Str }
to the end of the regex to store the result of each match. Now you will get the result you want.
$str.comb( / ^ [\d+]+ % '_' | <?after d\:> \w+ | <?after value\=> .*/ )
Since you have a comma-separated 'row' of information you're examining, you could try using split()
to break your matches up, and assign to an array. Below in the Raku REPL:
> my $str = '28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0';
28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0
> my @array = $str.split(", ")
[28_2820201112122420516_000000 column=d:bcp_startSoc timestamp=1605155065124 value=64.0]
> dd @array
Array @array = ["28_2820201112122420516_000000 column=d:bcp_startSoc", "timestamp=1605155065124", "value=64.0"]
Nil
> say @array.elems
3
Match on individual elements of the array:
> say @array[0] ~~ m/ ([\d+]+ % '_') \s 'column=d:' (\w+) /;
「28_2820201112122420516_000000 column=d:bcp_startSoc」
0 => 「28_2820201112122420516_000000」
1 => 「bcp_startSoc」
> say @array[0] ~~ m/ ([\d+]+ % '_') \s 'column=d:' <(\w+)> /;
「bcp_startSoc」
0 => 「28_2820201112122420516_000000」
> say @array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /;
「bcp_startSoc」
Boolean tests on matches to one-or-more array elements:
> say True if ( @array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /)
True
> say True if ( @array[2] ~~ m/ 'value=' <(<-[=]>+)> / )
True
> say True if ( @array[0] ~~ m/ [\d+]+ % '_' \s 'column=d:' <(\w+)> /) & ( @array[2] ~~ m/ 'value=' <(<-[=]>+)> / )
True
HTH.