java: List.contains() performance difference with manual searching
Writing a correct microbenchmark is hard. If you use a better benchmark, you'll likely see that the difference between the approaches is minor - at least, the following benchmark is far more robust, and shows a mere 10% difference in execution time between the two approaches:
public abstract class Benchmark {
final String name;
public Benchmark(String name) {
this.name = name;
}
abstract int run(int iterations) throws Throwable;
private BigDecimal time() {
try {
int nextI = 1;
int i;
long duration;
do {
i = nextI;
long start = System.nanoTime();
run(i);
duration = System.nanoTime() - start;
nextI = (i << 1) | 1;
} while (duration < 1000000000 && nextI > 0);
return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
@Override
public String toString() {
return name + "\t" + time() + " ns";
}
public static void main(String[] args) throws Exception {
final List<String> list = new ArrayList<String>();
for (int i = 0; i < 1000; i++) {
list.add("a");
}
Benchmark[] marks = {
new Benchmark("contains") {
@Override
int run(int iterations) throws Throwable {
for (int i = 0; i < iterations; i++) {
if (list.contains("b")) {
return 1;
}
}
return 0;
}
},
new Benchmark("loop") {
@Override
int run(int iterations) throws Throwable {
for (int i = 0; i < iterations; i++) {
for (String s : list) {
if (s.equals("b")) {
return 1;
}
}
}
return 0;
}
}
};
for (Benchmark mark : marks) {
System.out.println(mark);
}
}
}
prints (on my dated notebook, on a Java 7 Oracle JVM in server mode):
contains 10150.420 ns
loop 11363.640 ns
The slightly larger overhead of the loop is likely caused by the Iterator checking for concurrent modification and twice for the end of the list on every access, see the source code of java.util.ArrayList.Itr.next()
for details.
Edit: With very short lists, the difference is more pronounced. For instance for a list of length 1:
contains 15.316 ns
loop 69.401 ns
Still, nowhere near the 20:1 ratio your measurements indicate ...
First off, it is not wise to trust results coming from a singular test like that. There are too many variable factors, caching implications to consider, and other such things - you should rather consider writing a test that uses randomization over trials to some extent, and performs the different checks millions of times, not just once.
That said, I expect your results will remain the same; ArrayList
implements contains()
using its own indexOf()
method, which loops directly over the underlying array it stores. You can see this for yourself here
The foreach loop, on the other hand, requires instantiating an Iterator
, accessing the array through all of its methods, and just in general doing a lot more work than ArrayList
's own direct implementation does. Again, though, you should benchmark it more thoroughly!