What flavor of regex does git use
The Git source uses regcomp
and regexec
, which are defined by POSIX 1003.2. The code to compile a diff regexp is:
if (regcomp(ecbdata->diff_words->word_regex,
o->word_regex,
REG_EXTENDED | REG_NEWLINE))
which in POSIX means that these are "extended" regular expressions as defined here.
(Not every C library actually implements the same POSIX REG_EXTENDED
. Git includes its own implementation, which can be built in place of the system's.)
Edit (per updated question): POSIX EREs have neither lookahead nor lookbehind, nor do they have \w
(but [_[:alnum:]]
is probably close enough for most purposes).
Thanks for the hints from @torek 's answer above, now I realize that there are different flavors of regular expression engines and they could even have different syntax.
Even for one particular program, such as git, it could be compiled with a different regex engine. For example, this blog post hints that \w
would be supported by git, contradicting with what I observed from my machine or what the OP here asked.
I ended up finding this section from your recommended wikipedia page most helpful, in terms of presenting different syntax in one table, so that I could do some "translation" between for example [:alnum:]
and \w
, [:digit:]
and \d
, [:space:]
and \s
, etc..