What is more efficient: sorted stream or sorting a list?
Below is my benchmark (not really sure if it is correct):
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(MyBenchmark.N)
public class MyBenchmark {
public static final int N = 50;
public static final int SIZE = 100000;
static List<Integer> sourceList = new ArrayList<>();
static {
System.out.println("Generating the list");
for (int i = 0; i < SIZE; i++) {
sourceList.add(i);
}
System.out.println("Shuffling the list.");
Collections.shuffle(sourceList);
}
@Benchmark
public List<Integer> sortingList() {
List<Integer> sortedList = new ArrayList<>(sourceList);
Collections.sort(sortedList);
return sortedList;
}
@Benchmark
public List<Integer> sortedStream() {
List<Integer> sortedList = sourceList.stream().sorted().collect(Collectors.toList());
return sortedList;
}
@Benchmark
public List<Integer> treeSet() {
Set<Integer> sortedSet = new TreeSet<>(sourceList);
List<Integer> sortedList = new ArrayList<>(sortedSet);
return sortedList;
}
}
Results:
Benchmark Mode Cnt Score Error Units
MyBenchmark.sortedStream avgt 200 300691.436 ± 15894.717 ns/op
MyBenchmark.sortingList avgt 200 262704.939 ± 5073.915 ns/op
MyBenchmark.treeSet avgt 200 856577.553 ± 49296.565 ns/op
As in @Eugene's benchmark, sorting list is slightly (ca. 20%) faster than sorted stream. What surprizes me a bit is that treeSet
is significantly slower. I did not expect that.
It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)
Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps
), the sorted()
method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList
.
An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.
Otherwise, an
ArrayList
is used to gather the elements to be sorted.
If an ArrayList
is used, you incur the extra overhead of building / growing the list.
Then we return to two versions of the code:
List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);
In this version, the ArrayList
constructor copies the elements items
to an appropriately sized array, and then Collections.sort
does an in-place sort of that array. (This happens under the covers).
List<Item> sortedItems = items
.stream()
.sorted(itemComparator)
.collect(Collectors.toList());
In this version, as we have seen above, the code associated with sorted()
either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList
the slow way. But on top of that, there are the overheads of stream the data from items
and to the collector.
Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the O(NlogN)
sorting will tend to dominate the O(N)
overheads of copying. That will mean that the relative difference between the two versions will get smaller.
If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)
To be honest I don't trust myself too much either in JMH
(unless I understand the assembly, which takes lots of time in my case), especially since I've used @Setup(Level.Invocation)
, but here is a small test (I took the StringInput
generation from some other test I did, but it should not matter, it's just some data to sort)
@State(Scope.Thread)
public static class StringInput {
private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
"y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };
public String s = "";
public List<String> list;
@Param(value = { "1000", "10000", "100000" })
int next;
@TearDown(Level.Invocation)
public void tearDown() {
s = null;
}
@Setup(Level.Invocation)
public void setUp() {
list = ThreadLocalRandom.current()
.ints(next, 0, letters.length)
.mapToObj(x -> letters[x])
.map(x -> Character.toString((char) x.intValue()))
.collect(Collectors.toList());
}
}
@Fork(1)
@Benchmark
public List<String> testCollection(StringInput si){
Collections.sort(si.list, Comparator.naturalOrder());
return si.list;
}
@Fork(1)
@Benchmark
public List<String> testStream(StringInput si){
return si.list.stream()
.sorted(Comparator.naturalOrder())
.collect(Collectors.toList());
}
Results show that Collections.sort
is faster, but not by a big margin:
Benchmark (next) Mode Cnt Score Error Units
streamvsLoop.StreamVsLoop.testCollection 1000 avgt 2 0.038 ms/op
streamvsLoop.StreamVsLoop.testCollection 10000 avgt 2 0.599 ms/op
streamvsLoop.StreamVsLoop.testCollection 100000 avgt 2 12.488 ms/op
streamvsLoop.StreamVsLoop.testStream 1000 avgt 2 0.048 ms/op
streamvsLoop.StreamVsLoop.testStream 10000 avgt 2 0.808 ms/op
streamvsLoop.StreamVsLoop.testStream 100000 avgt 2 15.652 ms/op