Differences in RegEx syntax between Python and Java

Java doesn't parse Regular Expressions in the same way as Python for a small set of cases. In this particular case the nested ['s were causing problems. In Python you don't need to escape any nested [ but you do need to do that in Java.

The original RegEx (for Python):

/(\\.|[^[/\\\n]|\[(\\.|[^\]\\\n])*])+/([gim]+\b|\B)

The fixed RegEx (for Java and Python):

/(\\.|[^\[/\\\n]|\[(\\.|[^\]\\\n])*\])+/([gim]+\b|\B)

The obvious difference b/w Java and Python is that in Java you need to escape a lot of characters.

Moreover, you are probably running into a mismatch between the matching methods, not a difference in the actual regex notation:

Given the Java

String regex, input; // initialized to something
Matcher matcher = Pattern.compile( regex ).matcher( input );
  • Java's matcher.matches() (also Pattern.matches( regex, input )) matches the entire string. It has no direct equivalent in Python. The same result can be achieved by using re.match( regex, input ) with a regex that ends with $.
  • Java's matcher.find() and Python's re.search( regex, input ) match any part of the string.
  • Java's matcher.lookingAt() and Python's re.match( regex, input ) match the beginning of the string.

For more details also read Java's documentation of Matcher and compare to the Python documentation.

Since you said that isn't the problem, I decided to do a test: http://ideone.com/6w61T It looks like java is doing exactly what you need it to (group 0, the entire match, doesn't contain the ;). Your problem is elsewhere.

Tags:

Python

Java

Regex