Operator precedence in regular expressions
Using capturing groups to demonstrate the order of evaluation, the regex H|ha+
is equivalent to the following:
(H|(h(a+)))
This is because the precedence rules (as seen below) are applied in order from the highest precedence (the lowest numbered) one to the lowest precedence (the highest numbered) one:
Rule 5 →
(a+)
The+
is grouped with thea
because this operator works on the preceding single character, back-reference, group (a "marked sub-expression" in Oracle parlance), or bracket expression (character class).Rule 6 →
(h(a+))
Theh
is then concatenated with the group in the preceding step.Rule 8 →
(H|(h(a+)))
TheH
is then alternated with the group in the preceding step.
Precedence table from section 9.4.8 of the POSIX docs for regular expressions (there doesn't seem to be an official Oracle table):
+---+----------------------------------------------------------+
| | ERE Precedence (from high to low) |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..] |
| 2 | Escaped characters | \<special character> |
| 3 | Bracket expression | [] |
| 4 | Grouping | () |
| 5 | Single-character-ERE duplication | * + ? {m,n} |
| 6 | Concatenation | |
| 7 | Anchoring | ^ $ |
| 8 | Alternation | | |
+---+-----------------------------------+----------------------+
The table above is for Extended Regular Expressions. For Basic Regular Expressions see 9.3.7.
Given the Oracle doc:
Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.
And taking a look at the |
value in that table:
The expression a|b matches character a or character b.
Plus taking a look at the POSIX doc:
Operator precedence The order of precedence for of operators is as follows:
Collation-related bracket symbols [==] [::] [..]
Escaped characters \
Character set (bracket expression) []
Grouping ()
Single-character-ERE duplication * + ? {m,n}
Concatenation
Anchoring ^$
Alternation |
I would say that H|ha+
would be the same as (?:H|ha+)
.