Regex for rectangle brackets in R

You should enable perl = TRUE, then you can use Perl-like syntax which is more straight-forward (IMHO):

gsub("[\\[\\]$]","",mystring, perl = TRUE)

Or, you may use "smart placement" when placing ] at the start of the bracket expression ([ is not special inside it, there is no need escaping [ there):

gsub("[][$]","",mystring)

See demo

Result:

[1] "abcde"

More details

The [...] construct is considered a bracket expression by the TRE regex engine (used by default in base R regex functions - (g)sub, grep(l), (g)regexpr - when used without perl=TRUE), which is a POSIX regex construct. Bracket expressions, unlike character classes in NFA regex engines, do not support escape sequences, i.e. the \ char is treated as a a literal backslash char inside them.

Thus, the [\[\]] in a TRE regex matches \ or [ char (with the [\[\] part that is actually equal to [\[]) and then a ]. So, it matches \] or [] substrings, just have a look at gsub("[\\[\\]]", "", "[]\\]ab]") demo - it outputs ab] because [] and \] are matched and eventually removed.

Note that the terms POSIX bracket expressions and NFA character classes are used in the same meaning as is used at https://www.regular-expressions.info, it is not quite a standard, but there is a need to differentiate between the two.


I would sidestep [ab] syntax and use (a|b). Besides working, it may also be more readable:

gsub("(\\[|\\]|\\$)","",mystring)

Tags:

Regex

R