Performance penalty of String.intern()
I did a little bit of benchmarking myself. For the search cost part, I've decided to compare String.intern() with ConcurrentHashMap.putIfAbsent(s,s). Basically, those two methods do the same things, except String.intern() is a native method that stores and read from a SymbolTable that is managed directly in the JVM, and ConcurrentHashMap.putIfAbsent() is just a normal instance method.
You can find the benchmark code on github gist (for a lack of a better place to put it). You can also find the options I used when launching the JVM (to verify that the benchmark is not skewed) in the comments at the top of the source file.
Anyway here are the results:
Search cost (single threaded)
Legend
- count: the number of distinct strings that we are trying to pool
- initial intern: the time in ms it took to insert all the strings in the string pool
- lookup same string: the time in ms it took to lookup each of the strings again from the pool, using exactly the same instance as was previously entered in the pool
- lookup equal string: the time in ms it took to lookup each of the strings again from the pool, but using a different instance
String.intern()
count initial intern lookup same string lookup equal string
1'000'000 40206 34698 35000
400'000 5198 4481 4477
200'000 955 828 803
100'000 234 215 220
80'000 110 94 99
40'000 52 30 32
20'000 20 10 13
10'000 7 5 7
ConcurrentHashMap.putIfAbsent()
count initial intern lookup same string lookup equal string
1'000'000 411 246 309
800'000 352 194 229
400'000 162 95 114
200'000 78 50 55
100'000 41 28 28
80'000 31 23 22
40'000 20 14 16
20'000 12 6 7
10'000 9 5 3
The conclusion for the search cost: String.intern() is surprisingly expensive to call. It scales extremely badly, in something of O(n) where n is the number of strings in the pool. When the number of strings in the pool grows, the amount of time to lookup one string from the pool grows much more (0.7 microsecond per lookup with 10'000 strings, 40 microseconds per lookup with 1'000'000 strings).
ConcurrentHashMap scales as expected, the number of strings in the pool has no impact on the speed of the lookup.
Based on this experiment, I'd strongly suggest avoiding to use String.intern() if you are going to intern more than a few strings.
I have recently written an article about String.intern() implementation in Java 6, 7 and 8: String.intern in Java 6, 7 and 8 - string pooling.
There is a -XX:StringTableSize JVM parameter, which will allow you to make String.intern extremely useful in Java7+. So, unfortunately I have to say that this question is currently giving the misleading information to the readers.