Escaping regex string
Unfortunately, re.escape()
is not suited for the replacement string:
>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'
A solution is to put the replacement in a lambda:
>>> re.sub('a', lambda _: '_', 'aa')
'__'
because the return value of the lambda is treated by re.sub()
as a literal string.
Use the re.escape()
function for this:
4.2.3 re
Module Contents
escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.
def simplistic_plural(word, text):
word_or_plural = re.escape(word) + 's?'
return re.match(word_or_plural, text)
You can use re.escape()
:
re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'
If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.
If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_
).