Does FreeBSD contain multiple variants of basic regex?
The description in the linked article is wrong.
The actual POSIX definition states that:
The interpretation of an ordinary character preceded by an unescaped <backslash> ( '\' ) is undefined, except for [
(){}
, digits and inside a bracket expression]
And ordinary characters are defined as any except the BRE special characters .[^$*
and the backslash itself.
So, unlike that page claims, the \+
is undefined in BRE, and so is \|
.
Some regex implementations define them as the same as ERE +
and |
though, particularly the GNU ones. But you shouldn't count on that, stick to the defined features instead.
The problem here, of course is that the ERE alternation operator |
doesn't exist at all in BRE, and the equivalent to ERE +
is hideously ugly (it's \{1,\}
). So you probably want to use ERE instead.
$ echo ' aaaaa ' | sed 's/aaaaa|bbbbb/_/g'
aaaaa
$ echo ' aaaaa ' | sed -E 's/aaaaa|bbbbb/_/g'
_
$ echo ' aaaaa ' | sed -r 's/aaaaa|bbbbb/_/g'
_
$ echo ' aaaaa ' | sed -E '/(aaaaa|bbbbb)/ s/ /_/g'
____aaaaa___
$ echo ' aaaaa ' | sed -E '/aaaaa|bbbbb/ s/ /_/g'
____aaaaa___
or
is not a BRE (Basic Regular Expression). You need to specify -E
for extended BRE.
See Regex alternation/or operator (foo|bar) in GNU or BSD Sed
UPDATE
Why did grep work?
We can choose what kind of patttern we want to use with grep
-E, --extended-regexp PATTERN is an extended regular expression
-F, --fixed-strings PATTERN is a set of newline-separated strings
-G, --basic-regexp PATTERN is a basic regular expression
-P, --perl-regexp PATTERN is a Perl regular expression
-e, --regexp=PATTERN use PATTERN as a regular expression
By using these switches we can see that grep
does indeed default to BRE and that the OP expression fails with ERE:
$ echo ' aaaaa ' | grep '\(aaaaa\|bbbbb\)'
aaaaa
$ echo ' aaaaa ' | egrep '\(aaaaa\|bbbbb\)'
$ echo ' aaaaa ' | grep -E '\(aaaaa\|bbbbb\)'
$ echo ' aaaaa ' | grep -G '\(aaaaa\|bbbbb\)'
aaaaa
$ echo ' aaaaa ' | grep -G 'aaaaa\|bbbbb'
aaaaa
$ echo ' aaaaa ' | grep -G 'aaaaa|bbbbb'
$ echo ' aaaaa ' | grep -E 'aaaaa|bbbbb'
aaaaa
$ echo ' aaaaa ' | grep -E 'aaaaa\|bbbbb'
$ echo ' aaaaa ' | grep -G 'bbbbb\|aaaaa'
aaaaa
$ echo ' aaaaa ' | grep -E 'bbbbb\|aaaaa'
$ echo ' aaaaa ' | grep -G 'bbbbb|aaaaa'
$ echo ' aaaaa ' | grep -E 'bbbbb|aaaaa'
aaaaa
Both grep and sed reference re_format (7) which clearly states:
Obsolete ("basic") regular expressions differ in several respects. `|' is an ordinary character and there is no equivalent for its functionality.
But it does seem that if we "escape the pipe" then we do indeed get the functionality. That certainly has a smell to it. Furthermore there seems to be recent breakage in that ballpark - see regex(3): Add test to cover recent BRE regression.
And there seems to be some work to replace the regex in libc.
As Charles Duffy comments below
because some tools implement nonstandard extensions wherein you can use a backslash to get otherwise-ERE-only behavior in a BRE context
I am used to very good documentation with FreeBSD. This means that I am unsure whether this is intended but not documented - or breakage.