Is there a way to do multiple replacements with sed, without chaining the replacements?

You can't do the whole operation with a single substitution in sed, but you can do it correctly in different ways depending on whether the two substrings A and B are single characters or longer strings.

Assuming the two substrings A and B are single characters...

You want to transform AYB into BBYAA. To do this,

Change each A to B and B to A using y/AB/BA/.
Substitute each A in the new string with AA using s/A/AA/g.
Substitute each B in the new string with BB using s/B/BB/g.

$ echo AYB | sed 'y/AB/BA/; s/B/BB/g; s/A/AA/g'
BBYAA

Combine the two last steps to get

$ echo AYB | sed 'y/AB/BA/; s/[AB]/&&/g'
BBYAA

In fact, the ordering of the operations here does not really matter:

$ echo AYB | sed 's/[AB]/&&/g; y/AB/BA/'
BBYAA

The sed editing command y/// translates the characters in its first argument to the corresponding characters in its second argument, a bit like the tr utility does. This is done in a single operation, so you don't need to use a temporary for the swap of A and B in y/AB/BA/. In general, y/// is much faster in translating single characters than what e.g. s///g is (since no regular expressions are involved), and it's also able to insert newlines into strings with \n, which the standard s/// command can't do (s/// in GNU sed can obviously do this as a non-portable convenience extension).

The & character in the replacement part of the s/// command will be replaced by whatever the expression in the first argument matched, so s/[AB]/&&/g would double up any A or B character in the input data.

For multi-character substrings, assuming the substrings are distinct (i.e. one substring is not found in the other, as in the case of oo and foo), use something like

$ echo fooxbar | sed 's/foo/@/g; s/bar/foofoo/g; s/@/barbar/g'
barbarxfoofoo

I.e., swap the two strings via an intermediate string not otherwise found in the data. Note that the intermediate string could be any string not found in the data, not just a single character.

This is the kind of problem where you need a loop so you can search for both patterns simultaneously.

awk '
    BEGIN {
        regex = "A|B"
        map["A"] = "BB"
        map["B"] = "AA"
    }
    {
        str = $0
        result = ""
        while (match(str, regex)) {
            found = substr(str, RSTART, RLENGTH)
            result = result substr(str, 1, RSTART-1) map[found]
            str = substr(str, RSTART+RLENGTH)
        }
        print result str
    }
'

Of course, if Perl is available there's an equivalent oneliner:

perl -pe '
    BEGIN { %map = ("A" => "BB", "B" => "AA"); }
    s/(A|B)/$map{$1}/g;
'

If none of the patterns contain special characters, you can also build the regex dynamically:

perl -pe '
    BEGIN {
        %map = ("A" => "BB", "B" => "AA");
        $regex = join "|", keys %map;
    }
    s/($regex)/$map{$1}/g;
'

By the way, Tcl has a builtin command for this called string map, but it's not easy to write Tcl oneliners.

Demonstrating the effect that sorting the keys by length has:

without sorting

$ echo ABBA | perl -pe '
    BEGIN {
        %map = (A => "X", BB => "Y", AB => "Z");
        $regex = join "|", map {quotemeta} keys %map;
        print $regex, "\n";
    }
    s/($regex)/$map{$1}/g
'

A|AB|BB
XYX

with sorting

$ echo ABBA | perl -pe '
      BEGIN {
          %map = (A => "X", BB => "Y", AB => "Z");
          $regex = join "|", map {quotemeta $_->[1]}
                             reverse sort {$a->[0] <=> $b->[0]}
                             map {[length, $_]}
                             keys %map;
          print $regex, "\n";
      }
      s/($regex)/$map{$1}/g
  '

BB|AB|A
ZBX

Benchmarking "plain" sort versus Schwartzian in perl: The code in the subroutines is lifted directly from the sort documentation

#!perl
use Benchmark   qw/ timethese cmpthese /;

# make up some key=value data
my $key='a';
for $x (1..10000) {
    push @unsorted,   $key++ . "=" . int(rand(32767));
}

# plain sorting: first by value then by key
sub nonSchwartzian {
    my @sorted = 
        sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) } 
        @unsorted
}

# using the Schwartzian transform
sub schwartzian {
    my @sorted =
        map  { $_->[0] }
        sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] }
        map  { [$_, /=(\d+)/, uc($_)] } 
        @unsorted
}

# ensure the subs sort the same way
die "different" unless join(",", nonSchwartzian()) eq join(",", schwartzian());

# benchmark
cmpthese(
    timethese(-10, {
        nonSchwartzian => 'nonSchwartzian()',
        schwartzian    => 'schwartzian()',
    })
);

running it:

$ perl benchmark.pl
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.43 usr +  0.05 sys = 10.48 CPU) @  9.73/s (n=102)
schwartzian: 11 wallclock secs (10.13 usr +  0.03 sys = 10.16 CPU) @ 49.11/s (n=499)
                 Rate nonSchwartzian    schwartzian
nonSchwartzian 9.73/s             --           -80%
schwartzian    49.1/s           405%             --

The code using the Schwartzian tranform is 4 times faster.

Where the comparison function is only length of each element:

Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.06 usr +  0.03 sys = 10.09 CPU) @ 542.52/s (n=5474)
schwartzian: 10 wallclock secs (10.21 usr +  0.02 sys = 10.23 CPU) @ 191.50/s (n=1959)
                Rate    schwartzian nonSchwartzian
schwartzian    191/s             --           -65%
nonSchwartzian 543/s           183%             --

Schwartzian is much slower with this inexpensive sort function.

Can we get past the abusive commentary now?

With awk, you can use pattern1 as the field separator FS and the replacement1 as output field separator OFS. Then, loop over each field and replace pattern2 by replacement2:

awk '{for (f=1;f<=NF;f++){gsub(p,r,$f)} $1=$1}1' FS=A OFS=BB p=B r=AA file

The point of $1=$1 is to force a rebuild of the record, else it would fail for 0A for example.

This is POSIX compliant and involves no intermediary string so it is foolproof.

Is there a way to do multiple replacements with sed, without chaining the replacements?

Tags:

Sed

Text Processing

Related

Recent Posts