Anti-matching against an infinite family of <!before> patterns in Raku

It is a little difficult to discern what you are asking for.


You could be looking for something as simple as this:

say 'x_x___x________' ~~ / 'x'+ % '_' ** 1..3 /
# 「x_x___x」

or

say 'x_x___x________' ~~ / 'x'+ % '_' ** 1..2 /
# 「x_x」

or

say 'x_x___x________' ~~ / 'x'+ % '_'+ /
# 「x_x___x」

I would suggest using a Capture..., thusly:

'x_x___x________' ~~ /(.*?) _* $/; 
say $0;     #「x_x___x」

(The ? modifier makes the * 'non-greedy'.) Please let me know if I have missed the point!


avoid matching whitespace at the end of a string while still matching whitespace in the middle of words

Per Brad's answer, and your comment on it, something like this:

/ \w+ % \s+ /

what I'm looking for is a way to match arbitrarily long streams that end with a known pattern

Per @user0721090601's comment on your Q, and as a variant of @p6steve's answer, something like this:

/ \w+ % \s+ )> \s* $ /

The )> capture marker marks where capture is to end.

You can use arbitrary patterns on the left and right of that marker.

an infinite family of <!before> patterns

Generalizing to an infinite family of patterns of any type, whether they are zero-width or not, the most natural solution in a regex is iteration using any of the standard quantifiers that are open ended. For example, \s+ for one or more whitespace characters.[1] [2]

Is there a way to construct the rest of the pattern implied by the ...?

I'll generalize that to "Is there a way in a Raku regex to match some arbitrary pattern that could in theory be recognized by a computer program?"

The answer is always "Yes":

  • While Raku rules/regexes might look like traditional regexes they are in fact arbitrary functions embedded in an arbitrary program over which you ultimately have full control.

  • Rules have arbitrary read access to capture state.[3]

  • Rules can do arbitrary turing complete computation.[4]

  • A collection of rules/regexes can arbitrarily consume input and drive the parse/match state, i.e. can implement any parser.

In short, if it can be matched/parsed by any program written in any programming language, it can be matched/parsed using Raku rules/regexes.

Footnotes

[1] If you use an open ended quantifier you do need to make sure that each match iteration/recursion either consumes at least one character, or fails, so that you avoid an infinite loop. For example, the * quantifier will succeed even if the pattern it qualifies does not match, so be careful that that won't lead to an infinite loop.

[2] Given the way you wrote your example, perhaps you are curious about recursion rather than iteration. Suffice to say, it's easy to do that too.[1]

[3] In Raku rules, captures form a hierarchy. There are two special variables that track the capture state of two key levels of this hierarchy:

  • is the capture state of the innermost enclosing overall capture. Think of it as something analogous to a return value being constructed by the current function call in a stack of function calls.

  • $/ is the capture state of the innermost enclosing capture. Think of it as something analogous to a value being constructed by a particular block of code inside a function.

For example:

'123' ~~ / 1* ( 2* { print "$¢ $/" } ) 3* { print "$¢ $/" } / ; # 1 2123 123
  • The overall / ... / is analogous to an ordinary function call. The first 1 and first 123 of the output show what has been captured by that overall regex.

  • The ( ... ) sets up an inner capture for a part of the regex. The 2* { print "$¢ $/" } within it is analogous to a block of code. The 2 shows what it has captured.

  • The final 123 shows that, at the top level of the regex, $/ and have the same value.

[4] For example, the code in footnote 3 above includes arbitrary code inside the { ... } blocks. More generally:

  • Rules can be invoked recursively;

  • Rules can have full signatures and pass arguments;

  • Rules can contain arbitrary code;

  • Rules can use multiple dispatch semantics for resolution. Notably, this can include resolution based on longest match length.

Tags:

Raku

Grammar