Differences in RegEx syntax between Python and Java
Java doesn't parse Regular Expressions in the same way as Python for a small set of cases. In this particular case the nested [
's were causing problems. In Python you don't need to escape any nested [
but you do need to do that in Java.
The original RegEx (for Python):
/(\\.|[^[/\\\n]|\[(\\.|[^\]\\\n])*])+/([gim]+\b|\B)
The fixed RegEx (for Java and Python):
/(\\.|[^\[/\\\n]|\[(\\.|[^\]\\\n])*\])+/([gim]+\b|\B)
The obvious difference b/w Java and Python is that in Java you need to escape a lot of characters.
Moreover, you are probably running into a mismatch between the matching methods, not a difference in the actual regex notation:
Given the Java
String regex, input; // initialized to something
Matcher matcher = Pattern.compile( regex ).matcher( input );
- Java's
matcher.matches()
(alsoPattern.matches( regex, input )
) matches the entire string. It has no direct equivalent in Python. The same result can be achieved by usingre.match( regex, input )
with aregex
that ends with$
. - Java's
matcher.find()
and Python'sre.search( regex, input )
match any part of the string. - Java's
matcher.lookingAt()
and Python'sre.match( regex, input )
match the beginning of the string.
For more details also read Java's documentation of Matcher
and compare to the Python documentation.
Since you said that isn't the problem, I decided to do a test: http://ideone.com/6w61T
It looks like java is doing exactly what you need it to (group 0, the entire match, doesn't contain the ;
). Your problem is elsewhere.