How to ignore comments inside string literals

Perhaps this might be another option.

Match 0+ times any character except a backslash, dot or exclamation mark using the first negated character class.

Then when you do match a character that the first character class does not matches, use an alternation to match either:

  • repeat 0+ times matching either a dot that is not directly followed by 2 dots
  • or match from 3 dots to the next first match of 3 dots
  • or match only an escaped character

To prevent catastrophic backtracking, you can mimic an atomic group in Python using a positive lookahead with a capturing group inside. If the assertion is true, then use the backreference to \1 to match.

For example

(?<!\\)![^!\\.]*(?:(?:\.(?!\.\.)|(?=(\.{3}.*?\.{3}))\1|\\.)[^!\\.]*)*!

Explanation

  • (?<!\\)! Match ! not directly preceded by \
  • [^!\\.]* Match 1+ times any char except ! \ or .
  • (?: Non capture group
    • (?:\.(?!\.\.) Match a dot not directly followed by 2 dots
    • | Or
    • (?=(\.{3}.*?\.{3}))\1 Assert and capture in group 1 from ... to the nearest ...
    • | Or
    • \\. Match an escaped char
  • ) Close group
  • [^!\\.]* Match 1+ times any char except ! \ or .
  • )*! Close non capture group and repeat 0+ times, then match !

Regex demo