OR condition in Regex
I think what you need might be simply:
\d( \w)?
Note that your regex would have worked too if it was written as \d \w|\d
instead of \d|\d \w
.
This is because in your case, once the regex matches the first option, \d
, it ceases to search for a new match, so to speak.
Try
\d \w |\d
or add a positive lookahead if you don't want to include the trailing space in the match
\d \w(?= )|\d
When you have two alternatives where one is an extension of the other, put the longer one first, otherwise it will have no opportunity to be matched.
A classic "or" would be |
. For example, ab|de
would match either side of the expression.
However, for something like your case you might want to use the ?
quantifier, which will match the previous expression exactly 0 or 1 times (1 times preferred; i.e. it's a "greedy" match). Another (probably more relyable) alternative would be using a custom character group:
\d+\s+[A-Z\s]+\s+[A-Z][A-Za-z]+
This pattern will match:
\d+
: One or more numbers.\s+
: One or more whitespaces.[A-Z\s]+
: One or more uppercase characters or space characters\s+
: One or more whitespaces.[A-Z][A-Za-z\s]+
: An uppercase character followed by at least one more character (uppercase or lowercase) or whitespaces.
If you'd like a more static check, e.g. indeed only match ABC
and A ABC
, then you can combine a (non-matching) group and define the alternatives inside (to limit the scope):
\d (?:ABC|A ABC) Street
Or another alternative using a quantifier:
\d (?:A )?ABC Street