Advantages of Stream and Spring Data
Providing it as a Stream
gives the repository consumer the choice on how to collect the data.
In addition it allows chaining/piping of operations on the stream, such as mapping to DTOs, augmenting data, and filtering.
If the only thing you're ever going to do is collect it to a list and send as a response, then there is no benefit.
But take for example the case where a Thing
repository returns a List<Thing> findAllThings()
of n
Things
s because most of the time it's just sent as a list via the API.
But then someone builds a service in the application that needs to filter only Things
that exist in another set of m
Things
in the application.
We would have to recreate a list filtering on the set like
List<Thing> acceptedThings = repo.findAllThings()
.stream()
.filter(t->set.contains(t))
.collect(toList());
So we've had to iterate the original list and reconstruct a new list. If there are further operations on this list, you can see how it may be sub-optimal.
If the response from the repository had been Stream<Thing>
then we could have chained the filter operation and passed on the Stream for any further processing.
Stream<Thing> acceptedThings = repo.findAllThings()
.filter(t->set.contains(t));
Only right at the end when something consumes the stream will execute all the operations relevant for each item. This is much more efficient as each element only needs to be visited at most once and no intermediate collections need to be created.
Given that Spring now supports returning Streams as @ResponseBody
's in controllers, it's even better.
This is already supported in Spring Data JPA, look here; so there's not real advantage to override those to return Stream
. If you really want a Stream
and some potential advantages that would come with it - use what already Spring Data JPA provides.
And also a different aspect is that in JPA Spec 2.2
this could be the default return type of some queries. The JPA interfaces Query
and TypedQuery
will get a new method called getResultStream()
.
So Spring Data will use techniques specific to a particular provider, like Hibernate
or EclipseLink
to stream the result.
By default getResultStream
is just a list.stream
implementation, but Hibernate
already overrides that with ScrollableResult
. This is way more efficient if you need to process a very big result set.
You should see these options as a way to improve your programming model only, from imperative style JDK List
to a more functional style stream. You should still push down as much logic into the SQL query to benefit from indexing, better execution plans, etc. If your Stream.filter()
is simple, then it can be expressed as a SQL / JPQL WHERE
clause.
Please use SQL (or JPQL if it suffices) whenever querying the database. Don't filter in the client if you can avoid it. That would be like buying the entire produce in a super market, throwing everything away, just to get a single yoghurt.