Is there a way to do multiple replacements with sed, without chaining the replacements?
You can't do the whole operation with a single substitution in sed
, but you can do it correctly in different ways depending on whether the two substrings A
and B
are single characters or longer strings.
Assuming the two substrings A
and B
are single characters...
You want to transform AYB
into BBYAA
. To do this,
- Change each
A
toB
andB
toA
usingy/AB/BA/
. - Substitute each
A
in the new string withAA
usings/A/AA/g
. - Substitute each
B
in the new string withBB
usings/B/BB/g
.
$ echo AYB | sed 'y/AB/BA/; s/B/BB/g; s/A/AA/g'
BBYAA
Combine the two last steps to get
$ echo AYB | sed 'y/AB/BA/; s/[AB]/&&/g'
BBYAA
In fact, the ordering of the operations here does not really matter:
$ echo AYB | sed 's/[AB]/&&/g; y/AB/BA/'
BBYAA
The sed
editing command y///
translates the characters in its first argument to the corresponding characters in its second argument, a bit like the tr
utility does. This is done in a single operation, so you don't need to use a temporary for the swap of A
and B
in y/AB/BA/
. In general, y///
is much faster in translating single characters than what e.g. s///g
is (since no regular expressions are involved), and it's also able to insert newlines into strings with \n
, which the standard s///
command can't do (s///
in GNU sed
can obviously do this as a non-portable convenience extension).
The &
character in the replacement part of the s///
command will be replaced by whatever the expression in the first argument matched, so s/[AB]/&&/g
would double up any A
or B
character in the input data.
For multi-character substrings, assuming the substrings are distinct (i.e. one substring is not found in the other, as in the case of oo
and foo
), use something like
$ echo fooxbar | sed 's/foo/@/g; s/bar/foofoo/g; s/@/barbar/g'
barbarxfoofoo
I.e., swap the two strings via an intermediate string not otherwise found in the data. Note that the intermediate string could be any string not found in the data, not just a single character.
This is the kind of problem where you need a loop so you can search for both patterns simultaneously.
awk '
BEGIN {
regex = "A|B"
map["A"] = "BB"
map["B"] = "AA"
}
{
str = $0
result = ""
while (match(str, regex)) {
found = substr(str, RSTART, RLENGTH)
result = result substr(str, 1, RSTART-1) map[found]
str = substr(str, RSTART+RLENGTH)
}
print result str
}
'
Of course, if Perl is available there's an equivalent oneliner:
perl -pe '
BEGIN { %map = ("A" => "BB", "B" => "AA"); }
s/(A|B)/$map{$1}/g;
'
If none of the patterns contain special characters, you can also build the regex dynamically:
perl -pe '
BEGIN {
%map = ("A" => "BB", "B" => "AA");
$regex = join "|", keys %map;
}
s/($regex)/$map{$1}/g;
'
By the way, Tcl has a builtin command for this called string map
, but it's not easy to write Tcl oneliners.
Demonstrating the effect that sorting the keys by length has:
without sorting
$ echo ABBA | perl -pe ' BEGIN { %map = (A => "X", BB => "Y", AB => "Z"); $regex = join "|", map {quotemeta} keys %map; print $regex, "\n"; } s/($regex)/$map{$1}/g '
A|AB|BB XYX
with sorting
$ echo ABBA | perl -pe ' BEGIN { %map = (A => "X", BB => "Y", AB => "Z"); $regex = join "|", map {quotemeta $_->[1]} reverse sort {$a->[0] <=> $b->[0]} map {[length, $_]} keys %map; print $regex, "\n"; } s/($regex)/$map{$1}/g '
BB|AB|A ZBX
Benchmarking "plain" sort versus Schwartzian in perl: The code in the subroutines is lifted directly from the sort
documentation
#!perl
use Benchmark qw/ timethese cmpthese /;
# make up some key=value data
my $key='a';
for $x (1..10000) {
push @unsorted, $key++ . "=" . int(rand(32767));
}
# plain sorting: first by value then by key
sub nonSchwartzian {
my @sorted =
sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) }
@unsorted
}
# using the Schwartzian transform
sub schwartzian {
my @sorted =
map { $_->[0] }
sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] }
map { [$_, /=(\d+)/, uc($_)] }
@unsorted
}
# ensure the subs sort the same way
die "different" unless join(",", nonSchwartzian()) eq join(",", schwartzian());
# benchmark
cmpthese(
timethese(-10, {
nonSchwartzian => 'nonSchwartzian()',
schwartzian => 'schwartzian()',
})
);
running it:
$ perl benchmark.pl
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.43 usr + 0.05 sys = 10.48 CPU) @ 9.73/s (n=102)
schwartzian: 11 wallclock secs (10.13 usr + 0.03 sys = 10.16 CPU) @ 49.11/s (n=499)
Rate nonSchwartzian schwartzian
nonSchwartzian 9.73/s -- -80%
schwartzian 49.1/s 405% --
The code using the Schwartzian tranform is 4 times faster.
Where the comparison function is only length
of each element:
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.06 usr + 0.03 sys = 10.09 CPU) @ 542.52/s (n=5474)
schwartzian: 10 wallclock secs (10.21 usr + 0.02 sys = 10.23 CPU) @ 191.50/s (n=1959)
Rate schwartzian nonSchwartzian
schwartzian 191/s -- -65%
nonSchwartzian 543/s 183% --
Schwartzian is much slower with this inexpensive sort function.
Can we get past the abusive commentary now?
With awk
, you can use pattern1 as the field separator FS
and the replacement1 as output field separator OFS
. Then, loop over each field and replace pattern2 by replacement2:
awk '{for (f=1;f<=NF;f++){gsub(p,r,$f)} $1=$1}1' FS=A OFS=BB p=B r=AA file
The point of $1=$1
is to force a rebuild of the record, else it would fail for 0A
for example.
This is POSIX compliant and involves no intermediary string so it is foolproof.