Saving to database in stream pipeline
The absolute easiest example is:
cars.stream()
.map(this:saveCar)
.count()
In this case, from java-9 and up, map
will not be executed; since you do not need it to know the count
, at all.
There are other multiple cases where side effects would cause you lots of pain; under certain conditions.
As per the documentation on Oracle's website [...]
That link is for Java 8. You may want to read the documentation for Java 9 (which came out in 2017) and later versions, as they are more explicit to this regard. Specifically:
A stream implementation is permitted significant latitude in optimizing the computation of the result. For example, a stream implementation is free to elide operations (or entire stages) from a stream pipeline -- and therefore elide invocation of behavioral parameters -- if it can prove that it would not affect the result of the computation. This means that side-effects of behavioral parameters may not always be executed and should not be relied upon, unless otherwise specified (such as by the terminal operations
forEach
andforEachOrdered
). (For a specific example of such an optimization, see the API note documented on thecount()
operation. For more detail, see the side-effects section of the stream package documentation.)Source: Java 9's Javadoc for the
Stream
interface.
And also the updated version of the doc you quoted:
Side-effects
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
If the behavioral parameters do have side-effects, unless explicitly stated, there are no guarantees as to:
- the visibility of those side-effects to other threads;
- that different operations on the "same" element within the same stream pipeline are executed in the same thread; and
- that behavioral parameters are always invoked, since a stream implementation is free to elide operations (or entire stages) from a stream pipeline if it can prove that it would not affect the result of the computation.
The ordering of side-effects may be surprising. Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example,
IntStream.range(0,5).parallel().map(x -> x*2).toArray()
must produce[0, 2, 4, 6, 8]
), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.The eliding of side-effects may also be surprising. With the exception of terminal operations
forEach
andforEachOrdered
, side-effects of behavioral parameters may not always be executed when the stream implementation can optimize away the execution of behavioral parameters without affecting the result of the computation. (For a specific example see the API note documented on thecount
operation.)Source: Java 9's Javadoc for the
java.util.stream
package.
All emphasis mine.
As you can see, the current official documentation goes into more detail on the issues that you may encounter if you decide to use side-effects in your stream operations. It is also very clear on forEach
and forEachOrdered
being the only terminal operations where execution of side-effects is guaranteed (mind you, thread-safety issues still apply, as the official examples show).
That being said, and regarding your specific code, and said code only:
public List<SavedCars> saveCars(List<Car> cars) {
return cars.stream()
.map(this::saveCar)
.collect(Collectors.toList());
}
I see no Streams-related issues with said code as-is.
- The
.map()
step will be executed because.collect()
(a mutable reduction operation, which is what the official doc recommends instead of things like.forEach(list::add)
) relies on.map()
's output and, since this (i.e.saveCar()
's) output is different than its input, the stream cannot "prove that [eliding] it would not affect the result of the computation". - It is not a
parallelStream()
so it should not introduce any concurrency problems that didn't previously exist (of course, if someone added a.parallel()
later then problems may arise —much like if someone decided to parallelize afor
loop by firing up new threads for the inner computations).
That doesn't mean that the code in that example is Good Code™. The sequence .stream.map(::someSideEffect()).collect()
as a way of performing side-effects operations for every item in a collection may look like more simple / short / elegant? than its for
counterpart, and it sometimes may be. However, as Eugene, Holger and some others told you, there are better ways to approach this.
As a quick thought: the cost of firing up a Stream
vs iterating a simple for
is not negligible unless you have a lot of items, and if you have a lot of items then you: a) probably don't want to make a new DB access for each one, so a saveAll(List items)
API would be better; and b) probably don't want to take the performance hit of processing a lot of items sequentially, so you would end up using parallelization and then a whole new set of problems arise.