Why is String.strip() 5 times faster than String.trim() for blank string In Java 11
On OpenJDK 11.0.1 String.strip()
(actually StringLatin1.strip()
) optimizes stripping to an empty String
by returning an interned String
constant:
public static String strip(byte[] value) {
int left = indexOfNonWhitespace(value);
if (left == value.length) {
return "";
}
while String.trim()
(actually StringLatin1.trim()
) always allocates a new String
object. In your example st = 3
and len = 3
so
return ((st > 0) || (len < value.length)) ?
newString(value, st, len - st) : null;
will under the hood copy the array and creates a new String
object
return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);
Making above assumption we can update the benchmark to compare against a non-empty String
which shouldn't be affected by mentioned String.strip()
optimization:
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public class MyBenchmark {
public static final String EMPTY_STRING = " "; // 3 whitespaces
public static final String NOT_EMPTY_STRING = " a "; // 3 whitespaces with a in the middle
@Benchmark
public void testEmptyTrim() {
EMPTY_STRING.trim();
}
@Benchmark
public void testEmptyStrip() {
EMPTY_STRING.strip();
}
@Benchmark
public void testNotEmptyTrim() {
NOT_EMPTY_STRING.trim();
}
@Benchmark
public void testNotEmptyStrip() {
NOT_EMPTY_STRING.strip();
}
}
Running it shows no significant difference between strip()
and trim()
for a non-empty String
. Oddly enough trimming to an empty String
is still the slowest:
Benchmark Mode Cnt Score Error Units
MyBenchmark.testEmptyStrip thrpt 100 1887848947.416 ± 257906287.634 ops/s
MyBenchmark.testEmptyTrim thrpt 100 206638996.217 ± 57952310.906 ops/s
MyBenchmark.testNotEmptyStrip thrpt 100 399701777.916 ± 2429785.818 ops/s
MyBenchmark.testNotEmptyTrim thrpt 100 385144724.856 ± 3928016.232 ops/s
After looking into the source code of OpenJDK, assuming the implementation of the Oracle version is similar, I would imagine the difference is explained by the facts that
strip
will try to find the first non-whitespace character, and if none is found, simply returns""
trim
will always return anew String(...the substring...)
One could argue that strip
is just a tiny bit more optimised than trim
, at least in OpenJDK, because it dodges the creation of new object unless necessary.
(Note: I didn't take the trouble to check the unicode versions of these methods.)