Only match unique string occurrences

Unique string occurrence can be matched with

<STRING_PATTERN>(?!.*<STRING_PATTERN>)  // Find the last occurrence
(?<!<STRING_PATTERN>.*)<STRING_PATTERN> // Find the first occurrence, only works in regex
                                        // that supports infinite-width lookbehind patterns

where <STRING_PATTERN> is the pattern the unique occurrence of which one searches for. Note that both will work with the .NET regex library, but the second one is not usually supported by the majority of other libraries (only PyPi Python regex library and the JavaScript ECMAScript 2018 regex support it). Note that . does not match line break chars by default, so you need to pass a modifier like DOTALL (in most libraries, you may add (?s) modifier inside the pattern (only in Ruby (?m) does the same), or use specific flags that you pass to the regex compile method. See more about this in How do I match any character across multiple lines in a regular expression?

You seem to need a regex like this:

/\b((?!CR|cr)[A-Za-z]{2}\d{5,6})\b(?![\s\S]*\b\1\b)/

The regex demo is available here

Details:

  • \b - a leading word boundary
  • ((?!CR|cr)[A-Za-z]{2}\d{5,6}) - Group 1 capturing
    • (?!CR|cr) - the next two characters cannot be CR or cr, the negative lookahead check
    • [A-Za-z]{2} - 2 ASCII letters
    • \d{5,6} - 5 to 6 digits
  • \b - trailing word boundary
  • (?![\s\S]*\b\1\b) - a negative lookahead that fails the match if there are any 0+ chars ([\s\S]*) followed with a word boundary (\b), same value captured into Group 1 (with the \1 backreference), and a trailing word boundary.

Tags:

Regex