How to ignore comments inside string literals
Perhaps this might be another option.
Match 0+ times any character except a backslash, dot or exclamation mark using the first negated character class.
Then when you do match a character that the first character class does not matches, use an alternation to match either:
- repeat 0+ times matching either a dot that is not directly followed by 2 dots
- or match from 3 dots to the next first match of 3 dots
- or match only an escaped character
To prevent catastrophic backtracking, you can mimic an atomic group in Python using a positive lookahead with a capturing group inside. If the assertion is true, then use the backreference to \1
to match.
For example
(?<!\\)![^!\\.]*(?:(?:\.(?!\.\.)|(?=(\.{3}.*?\.{3}))\1|\\.)[^!\\.]*)*!
Explanation
(?<!\\)!
Match ! not directly preceded by\
[^!\\.]*
Match 1+ times any char except!
\
or.
(?:
Non capture group(?:\.(?!\.\.)
Match a dot not directly followed by 2 dots|
Or(?=(\.{3}.*?\.{3}))\1
Assert and capture in group 1 from...
to the nearest...
|
Or\\.
Match an escaped char
)
Close group[^!\\.]*
Match 1+ times any char except!
\
or.
)*!
Close non capture group and repeat 0+ times, then match!
Regex demo