Stream with sorted() before findFirst() is no longer lazy
Here's a smaller example that illustrates the issue:
Stream.of("a", "ab", "abc", "abcd")
// .sorted() // uncomment and what follows becomes eager
.filter(s -> s.contains("b"))
.peek(s -> System.out.println("PEEK: " + s))
.findFirst()
.orElse("X");
As expected the output is:
PEEK: ab
If the sorted
line is uncommented, the output is:
PEEK: ab
PEEK: abc
PEEK: abcd
(The final result of the entire pipeline is "ab" in both cases, as expected.)
It's true that sorted
must consume all of its input before producing its first output element. In that sense it's eager. However, it does seem strange that it affects how elements are sent downstream.
Without sorting, the findFirst
operation "pulls" elements from upstream until it finds one, and then it stops. With sorting, the sorted()
operation eagerly gathers all the elements, sorts them, and since it has them all right there, it "pushes" them down the stream. Of course, findFirst
ignores all but the first element. But this means that intervening operations (such as the filter) may do unnecessary work.
The final result is correct, but the behavior is unexpected. This might be considered a bug. I'll investigate and file a bug if appropriate.
The sorted
operation forces traversal of all the items in the stream.
Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.
Stateful operations may need to process the entire input before producing a result. For example, one cannot produce any results from sorting a stream until one has seen all elements of the stream.
(Source)
I'm not sure, though, why the operations following the sorted
are also executed for all the elements in the stream.
If you perform the sort separately, and then use the stream for the rest of the processing, the processing will stop when the first match is found, as expected.
Arrays.sort(dataArray, Comparator.comparing(d -> d.getPriority())); // sort
Arrays.stream(dataArray)
.peek(o -> System.out.println("SORT: " + o))
.map(d -> d.getOriginalURL(shortUrl))
.peek(o -> System.out.println("MAP: " + o))
.filter(u -> u != null && !u.isEmpty())
.peek(o -> System.out.println("FILTER: " + o))
.findFirst().orElse("");
What you want to use is stream.collect(minBy(...))
. It has linear performance.
The minBy
static method is found on the Collectors
class.