What does the "r" in pythons re.compile(r' pattern flags') mean?
As @PauloBu
stated, the r
string prefix is not specifically related to regex's, but to strings generally in Python.
Normal strings use the backslash character as an escape character for special characters (like newlines):
>>> print('this is \n a test')
this is
a test
The r
prefix tells the interpreter not to do this:
>>> print(r'this is \n a test')
this is \n a test
>>>
This is important in regular expressions, as you need the backslash to make it to the re
module intact - in particular, \b
matches empty string specifically at the start and end of a word. re
expects the string \b
, however normal string interpretation '\b'
is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'
), or tell python it is a raw string (r'\b'
).
>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']
No, as the documentation pasted in explains the r
prefix to a string indicates that the string is a raw string
.
Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \
character, raw strings provide a way to indicate to python that you want an unescaped string.
Examine the following:
>>> "\n"
'\n'
>>> r"\n"
'\\n'
>>> print "\n"
>>> print r"\n"
\n
Prefixing with an r
merely indicates to the string that backslashes \
should be treated literally and not as escape characters for python.
This is helpful, when for example you are searching on a word boundry. The regex for this is \b
, however to capture this in a Python string, I'd need to use "\\b"
as the pattern. Instead, I can use the raw string: r"\b"
to pattern match on.
This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\
, to escape this in python means I need to escape each slash and the pattern becomes "\\\\"
, or the much simpler r"\\"
.
As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.
No. Not everything in regex syntax needs to be preceded by \
, so .
, *
, +
, etc still have special meaning in a pattern
The r''
is often used as a convenience for regex that do need a lot of \
as it prevents the clutter of doubling up the \