Is there a way to do multiple replacements with sed, without chaining the replacements?
You can't do the whole operation with a single substitution in sed
, but you can do it correctly in different ways depending on whether the two substrings A
and B
are single characters or longer strings.
Assuming the two substrings A
and B
are single characters...
You want to transform AYB
into BBYAA
. To do this,
- Change each
. - Substitute each
in the new string withAA
. - Substitute each
in the new string withBB
$ echo AYB | sed 'y/AB/BA/; s/B/BB/g; s/A/AA/g'
Combine the two last steps to get
$ echo AYB | sed 'y/AB/BA/; s/[AB]/&&/g'
In fact, the ordering of the operations here does not really matter:
$ echo AYB | sed 's/[AB]/&&/g; y/AB/BA/'
The sed
editing command y///
translates the characters in its first argument to the corresponding characters in its second argument, a bit like the tr
utility does. This is done in a single operation, so you don't need to use a temporary for the swap of A
and B
in y/AB/BA/
. In general, y///
is much faster in translating single characters than what e.g. s///g
is (since no regular expressions are involved), and it's also able to insert newlines into strings with \n
, which the standard s///
command can't do (s///
in GNU sed
can obviously do this as a non-portable convenience extension).
The &
character in the replacement part of the s///
command will be replaced by whatever the expression in the first argument matched, so s/[AB]/&&/g
would double up any A
or B
character in the input data.
For multi-character substrings, assuming the substrings are distinct (i.e. one substring is not found in the other, as in the case of oo
and foo
), use something like
$ echo fooxbar | sed 's/foo/@/g; s/bar/foofoo/g; s/@/barbar/g'
I.e., swap the two strings via an intermediate string not otherwise found in the data. Note that the intermediate string could be any string not found in the data, not just a single character.
This is the kind of problem where you need a loop so you can search for both patterns simultaneously.
awk '
regex = "A|B"
map["A"] = "BB"
map["B"] = "AA"
str = $0
result = ""
while (match(str, regex)) {
found = substr(str, RSTART, RLENGTH)
result = result substr(str, 1, RSTART-1) map[found]
str = substr(str, RSTART+RLENGTH)
print result str
Of course, if Perl is available there's an equivalent oneliner:
perl -pe '
BEGIN { %map = ("A" => "BB", "B" => "AA"); }
If none of the patterns contain special characters, you can also build the regex dynamically:
perl -pe '
%map = ("A" => "BB", "B" => "AA");
$regex = join "|", keys %map;
By the way, Tcl has a builtin command for this called string map
, but it's not easy to write Tcl oneliners.
Demonstrating the effect that sorting the keys by length has:
without sorting
$ echo ABBA | perl -pe ' BEGIN { %map = (A => "X", BB => "Y", AB => "Z"); $regex = join "|", map {quotemeta} keys %map; print $regex, "\n"; } s/($regex)/$map{$1}/g '
with sorting
$ echo ABBA | perl -pe ' BEGIN { %map = (A => "X", BB => "Y", AB => "Z"); $regex = join "|", map {quotemeta $_->[1]} reverse sort {$a->[0] <=> $b->[0]} map {[length, $_]} keys %map; print $regex, "\n"; } s/($regex)/$map{$1}/g '
Benchmarking "plain" sort versus Schwartzian in perl: The code in the subroutines is lifted directly from the sort
use Benchmark qw/ timethese cmpthese /;
# make up some key=value data
my $key='a';
for $x (1..10000) {
push @unsorted, $key++ . "=" . int(rand(32767));
# plain sorting: first by value then by key
sub nonSchwartzian {
my @sorted =
sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) }
# using the Schwartzian transform
sub schwartzian {
my @sorted =
map { $_->[0] }
sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] }
map { [$_, /=(\d+)/, uc($_)] }
# ensure the subs sort the same way
die "different" unless join(",", nonSchwartzian()) eq join(",", schwartzian());
# benchmark
timethese(-10, {
nonSchwartzian => 'nonSchwartzian()',
schwartzian => 'schwartzian()',
running it:
$ perl
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.43 usr + 0.05 sys = 10.48 CPU) @ 9.73/s (n=102)
schwartzian: 11 wallclock secs (10.13 usr + 0.03 sys = 10.16 CPU) @ 49.11/s (n=499)
Rate nonSchwartzian schwartzian
nonSchwartzian 9.73/s -- -80%
schwartzian 49.1/s 405% --
The code using the Schwartzian tranform is 4 times faster.
Where the comparison function is only length
of each element:
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.06 usr + 0.03 sys = 10.09 CPU) @ 542.52/s (n=5474)
schwartzian: 10 wallclock secs (10.21 usr + 0.02 sys = 10.23 CPU) @ 191.50/s (n=1959)
Rate schwartzian nonSchwartzian
schwartzian 191/s -- -65%
nonSchwartzian 543/s 183% --
Schwartzian is much slower with this inexpensive sort function.
Can we get past the abusive commentary now?
With awk
, you can use pattern1 as the field separator FS
and the replacement1 as output field separator OFS
. Then, loop over each field and replace pattern2 by replacement2:
awk '{for (f=1;f<=NF;f++){gsub(p,r,$f)} $1=$1}1' FS=A OFS=BB p=B r=AA file
The point of $1=$1
is to force a rebuild of the record, else it would fail for 0A
for example.
This is POSIX compliant and involves no intermediary string so it is foolproof.