Regex execution is too slow in java

It isn't really an infinite loop, it's just taking a really long time. For all practical purposes, we can call it a hang.

Your Regex may be improved.

Try to put $ at the end of it. It will say that this is the end of the line. It may help you saving time.

Edit :

 String subject = "www-association-belgo-palestinienne-be";
 Pattern pattern = Pattern.compile("^[A-Za-z0-9]([-_A-Za-z0-9]*)(\\.[-_A-Za-z0-9]+)*\\.([-_A-Za-z0-9]+\\.)*([-_A-Za-z0-9]*)[A-Za-z0-9]$");

 Matcher m = pattern.matcher(subject);
 System.out.println("    Start");
 boolean hasFind = m.find();
 System.out.println("    Finish : " + hasFind);

See How do you debug a regex?.

Specifically, I would try regexpal, and change the java backslashes to single ones.


The problem is the ([A-Za-z0-9_-]+\\.?)* part of the regular expression. Note that it has a quantifier (+) inside another quantifier (*). This causes catastrophic backtracking - basically, it has to try an exponential number of matches in order to check the regular expression, at least the way most regular expression engines are implemented (including the Java one).

If you use possessive quantifiers, you will be able to avoid this problem, however that would change the meaning of your regex, and it would no longer match what you want it to match.

I think the trick here is to find a regex which expresses what you want to solve, without double quantifiers. For example, the following should work:

Pattern.compile("^[A-Za-z0-9]\\.?([A-Za-z0-9_-]|[A-Za-z0-9_-]\\.)*[A-Za-z0-9]\\.[A-Za-z]{2,6}$");

I think this expresses the same class of strings that you are trying to match, and should be much faster.


It's not an infinite loop. The problem is that it's checking every possible match and not finding one. If you could let it run for a gazillion years, it will eventually terminate. See this article for a good explanation of what's happening under the hood.

Perhaps this regular expression is satisfactory (it terminates on the given string): ^[A-Za-z0-9][A-Za-z0-9_-]*(\\.[A-Za-z0-9_-]+)*\\.[A-Za-z]{2,6}$ (see http://ideone.com/Z0rlg)

Tags:

Java

Regex