Python Regex Engine - "look-behind requires fixed-width pattern" Error
Python lookbehind assertions need to be fixed width, but you can try this:
>>> s = '"It "does "not "make "sense", Well, "Does "it"'
>>> re.sub(r'\b\s*"(?!,|$)', '" "', s)
'"It" "does" "not" "make" "sense", Well, "Does" "it"'
Explanation:
\b # Start the match at the end of a "word"
\s* # Match optional whitespace
" # Match a quote
(?!,|$) # unless it's followed by a comma or end of string
Python re
lookbehinds really need to be fixed-width, and when you have alternations in a lookbehind pattern that are of different length, there are several ways to handle this situation:
- Rewrite the pattern so that you do not have to use alternation (e.g. Tim's above answer using a word boundary, or you might also use an exact equivalent
(?<=[^,])"(?!,|$)
of your current pattern that requires a char other than a comma before the double quote, or a common pattern to match words enclosed with whitespace,(?<=\s|^)\w+(?=\s|$)
, can be written as(?<!\S)\w+(?!\S)
), or - Split the lookbehinds:
- Positive lookbehinds need to be alternated in a group (e.g.
(?<=a|bc)
should be rewritten as(?:(?<=a)|(?<=bc))
) - If the pattern in a lookbehind is an alternation of an anchor with a single char, you can reverse the sign of the lookbehind and use a negated character class with the char inside. E.g.
(?<=\s|^)
matches either a whitespace or start of a string/line (ifre.M
is used). So, in Pythonre
, use(?<!\S)
. The(?<=^|;)
will be converted to(?<![^;])
. And if you also want to make sure the start of a line is matched, too, add\n
to the negated character class, e.g.(?<![^;\n])
(see Python Regex: Match start of line, or semi-colon, or start of string, none capturing group). Note this is not necessary with(?<!\S)
as\S
does not match a line feed char. - Negative lookbehinds can be just concatenated (e.g.
(?<!^|,)"(?!,|$)
should look like(?<!^)(?<!,)"(?!,|$)
).
- Positive lookbehinds need to be alternated in a group (e.g.
Or, simply install PyPi regex module using pip install regex
(or pip3 install regex
) and enjoy infinite width lookbehind.