what is the fastest substring search method in Java
The accepted answer is not correct and not complete.
indexOf()
does a naive string search using backtracking on mismatches. This is quite fast on small patterns/texts but shows very poor performance on large textscontains("ja")
should be comparable to indexOf (because it delegates to it)matches("ja")
will not deliver the correct result, because it searches for an exact match (only the string"ja"
will match exactly)Pattern p = Pattern.compile("ja"); Matcher m = p.matcher("jack"); m.find();
would be the correct way to find texts with regular expressions. In practice (using large texts) it will be the most efficient way using only the java api. This is because a constant pattern (like"ja"
) will not be processed by the regex engine (which is slow) but by an Boyer-Moore-Algorithm (which is fast)
As far as the three you asked about, a regular expression is going to be much slower because it requires putting together a full state machine when you have a much simpler target. For contains
vs indexOf
...
2114 public boolean contains(CharSequence s) {
2115 return indexOf(s.toString()) > -1;
2116 }
(i.e., contains
just calls indexOf
, but you might incur an extra String
creation on each invocation. This is just one implementation of contains
, but since the contract of contains
is a simplification of indexOf
, this is probably how every implementation will work.)
String[] names = new String[]{"jack", "jackson", "jason", "dijafu"};
long start = 0;
long stop = 0;
//Contains
start = System.nanoTime();
for (int i = 0; i < names.length; i++){
names[i].contains("ja");
}
stop = System.nanoTime();
System.out.println("Contains: " + (stop-start));
//IndexOf
start = System.nanoTime();
for (int i = 0; i < names.length; i++){
names[i].indexOf("ja");
}
stop = System.nanoTime();
System.out.println("IndexOf: " + (stop-start));
//Matches
start = System.nanoTime();
for (int i = 0; i < names.length; i++){
names[i].matches("ja");
}
stop = System.nanoTime();
System.out.println("Matches: " + (stop-start));
Output:
Contains: 16677
IndexOf: 4491
Matches: 864018