How to replace paired square brackets with other syntax with sed?
use groups
sed 's|\[\([^]]*\)\]|\\macro{\1}|g' file
sed -e 's/\[\([^]]*\)\]/\\macro{\1}/g' file.txt
This looks for an opening bracket, any number of explicitly non-closing brackets, then a closing bracket. The group is captured by the parens and inserted into the replacement expression.
It took a little doing, but here:
sed -i.bkup 's/\[\([^]]*\)\]/\\macro{\1}/g' test.txt
Let's see if I can explain this regular expression:
- The
\[
is matching a square bracket. Since[
is a valid magic regular expression character, the backslash means to match the literal character. - The
\(...\)
is a capture group. It captures the part of the regular expression I want. I can have many capture groups, and insed
I can reference them as\1
,\2
, etc. - Inside the capture group
\(...\)
. I have[^]]*
.- The
[^...]
syntax means any character but. - The
[^]]
means any character but a closing brace. - The
*
means zero or more of the preceding. That means I am capturing zero or more characters that are not closing square braces.
- The
- The
\]
means the closing square bracket
Let's look at the line this is [some] more [text]
- In #1 above, I capture the first open square bracket in front of the word some. However, it's not in a capture group. This is the first character I'm going to substitute.
- I now start a capture group. I am capturing according to 3.2 and 3.3 above, starting with the letter
s
in some as many characters as possible that are not closing square brackets. This means I am matching[some
, but only capturingsome
. - In #4, I have ended my capture group. I've matched for substitution purposes
[some
and now I'm matching on the last closing square bracket. That means I'm matching[some]
. Note that regular expressions are normally greedy. I'll explain below why this is important. - Now, I can match the replacement string. This is much easier. It's
\\macro(\1)
. The\1
is replaced by my capture group. The\\
is just a backslash. Thus, I'll replace[some]
with\macro{some}
.
It would be much easier if I could be guaranteed a single set of square brackets in each line. Then I could have done this:
sed -i.bkup 's/\[\(.*\)\]/\\macro(\1)/g'
The capture group is now saying anything between to square brackets. However, the problem is that regular expressions are greedy, that means I would have matched from the s
in some
all the way to the final t
in text. The 'x' below show the capture group. The [
and ]
show the square brackets I'm matching on:
this is [some] more [text]
[xxxxxxxxxxxxxxxx]
This became more complex because I had to match on characters that had special meaning to regular expressions, so we see a lot of backslashing. Plus, I had to account for regular expression greediness, which got the nice looking, non-matching string [^]]*
to match anything not a closing bracket. Add in the square brackets before and after \[[^]]*\]
, and don't forget the \(...\)
capture group: \[\([^]]*\)\]
And you get one big mess of a regular expression.