Regex alternation/or operator (foo|bar) in GNU or BSD Sed
By default sed
uses POSIX Basic Regular Expressions, which don't include the |
alternation operator. You can switch it into using Extended Regular Expressions, which do include |
alternation, with -E
(or -r
in some older versions of some implementations). You can use:
echo 'cat dog pear banana cat dog' | sed -E -e 's/cat|dog/Bear/g'
and it will work on compliant systems. (-e
optionally marks the sed script itself - you can leave it out, it just guards against some kinds of mistake)
Portability to very old sed
s is complicated, but you can also switch to awk
if you need it, which uses EREs everywhere.
This happens because (a|b)
is an extended regular expression, not a Basic Regular Expression. Use the -E
option to deal with this.
echo 'cat
dog
pear
banana
cat
dog'|sed -E 's/cat|dog/Bear/g'
From the sed
man page:
-E Interpret regular expressions as extended (modern) regular expressions rather than basic regular expressions (BRE's).
Note that -r
is another flag for the same thing, but -E
is more portable and will even be in the next version of POSIX specifications.
The portable way to do this - and the more efficient way - is with addresses. You can do this:
printf %s\\n cat dog pear banana cat dog |
sed -e '/cat/!{/dog/!b' -e '};cBear'
In this way if the line does not contain the string cat and does not contain the string dog sed
b
ranches out of the script, autoprints its current line and pulls in the next to begin the next cycle. It therefore does not perform the next instruction - which in this example c
hanges the entire line to read Bear but it could do anything.
It's probably worth noting also that any statement following the !b
in that sed
command can only match on a line containing either the string dog
or cat
- so you can perform further tests without any danger of matching a line that doesn't - which means you can now apply rules to only one or the other as well.
But that's next. Here's output from the above command:
###OUTPUT###
Bear
Bear
pear
banana
Bear
Bear
You can also portably implement a lookup table with backreferences.
printf %s\\n cat dog pear banana cat dog |
sed '1{x;s/^/ cat dog /;x
};G;s/^\(.*\)\n.* \1 .*/Bear/;P;d'
It's a lot more work to setup for this simple example case, but it can make for much more flexible sed
scripts in the long run.
In the first line I ex
change hold space and pattern space then insert the string <space>
cat<space>
dog<space>
into hold space before ex
changing them back.
From then on and on every following line I G
et hold space appended to pattern space, then check to see if all of the characters from the beginning of the line until the newline I just added at the end match a string surrounded by spaces after it. If so I replace the entire lot with Bear and if not there is no harm done because I next P
rint only up to the first occurring newline in pattern space then d
elete it all.
###OUTPUT###
Bear
Bear
pear
banana
Bear
Bear
And when I say flexible, I mean it. Here it is replacing cat with BrownBear and dog with BlackBear:
printf %s\\n cat dog pear banana cat dog |
sed '1{x;s/^/ 1cat Brown 2dog Black /;x
};G;s/^\(.*\)\n.* [0-9]\1 \([^ ]*\) .*/\2Bear/;P;d'
###OUTPUT###
BrownBear
BlackBear
pear
banana
BrownBear
BlackBear
You can of course expand a great deal on the contents of the lookup table - I picked up the idea from Greg Ubben's usenet emails on the subject when, in the 90's, he described how he constructed a crude calculator out of a single sed s///
statement.