In the regex world what's a flavor and which flavor does Java use?
There are many different variations of what features a regex engine implements, what technique it uses "under the hood" and what syntax it uses for certain features.
There is a very good article and comparison table at regular-expressions.info.
The Java regex package implements a "Perl-like" regular expressions engine, but it has some extra features like possessive quantifiers (.*+
) and variable-length (but finite) lookbehind assertions). On the other hand, it misses a few features Perl has, namely conditional expressions or comments. All in all, it's a very full-featured implementation.
The term "flavor" refers to the regex engine – the syntax and additional properties supported by the particular regex engine.
The Pattern
class documents the properties of the Java regex engine.
Aside from the basic things like the meaning of metacharacters, different implementations of regex engines support different types of syntaxes.
For example:
- POSIX engines support
[:digit:]
for digits (same as[0-9]
); - Perl compatible engines support
\d
shortcut for digits; - JavaScript doesn't support lookbehinds;
- PHP and some others support lookbehinds, but needs them to be fixed length;
- Regex engines of text editors (Notepad++) generally don't support lookarounds.
Java uses perl like reg-ex syntax