How to get multiple values from an object using a single stream operation?
JDK 12 and above has Collectors.teeing
(webrev and CSR), which collects to two different collectors and then merges both partial results into a final result.
You could use it here to collect to two IntSummaryStatistics
for both the x
coordinate and the y
coordinate:
List<IntSummaryStatistics> stats = points.stream()
.collect(Collectors.teeing(
Collectors.mapping(p -> p.x, Collectors.summarizingInt()),
Collectors.mapping(p -> p.y, Collectors.summarizingInt()),
List::of));
int minX = stats.get(0).getMin();
int maxX = stats.get(0).getMax();
int minY = stats.get(1).getMin();
int maxY = stats.get(1).getMax();
Here the first collector gathers statistics for x
and the second one for y
. Then, statistics for both x
and y
are merged into a List
by means of the JDK 9 List.of
factory method that accepts two elements.
An aternative to List::of
for the merger would be:
(xStats, yStats) -> Arrays.asList(xStats, yStats)
If you happen to not have JDK 12 installed on your machine, here's a simplified, generic version of the teeing
method that you can safely use as a utility method:
public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
Collector<? super T, A1, R1> downstream1,
Collector<? super T, A2, R2> downstream2,
BiFunction<? super R1, ? super R2, R> merger) {
class Acc {
A1 acc1 = downstream1.supplier().get();
A2 acc2 = downstream2.supplier().get();
void accumulate(T t) {
downstream1.accumulator().accept(acc1, t);
downstream2.accumulator().accept(acc2, t);
}
Acc combine(Acc other) {
acc1 = downstream1.combiner().apply(acc1, other.acc1);
acc2 = downstream2.combiner().apply(acc2, other.acc2);
return this;
}
R applyMerger() {
R1 r1 = downstream1.finisher().apply(acc1);
R2 r2 = downstream2.finisher().apply(acc2);
return merger.apply(r1, r2);
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
}
Please note that I'm not considering the characteristics of the downstream collectors when creating the returned collector.
By analogy with IntSummaryStatistics
, create a class PointStatistics
which collects the information you need. It defines two methods: one for recording values from a Point
, one for combining two Statistics
.
class PointStatistics {
private int minX = Integer.MAX_VALUE;
private int maxX = Integer.MIN_VALUE;
private int minY = Integer.MAX_VALUE;
private int maxY = Integer.MIN_VALUE;
public void accept(Point p) {
minX = Math.min(minX, p.x);
maxX = Math.max(maxX, p.x);
minY = Math.min(minY, p.y);
maxY = Math.max(minY, p.y);
}
public void combine(PointStatistics o) {
minX = Math.min(minX, o.minX);
maxX = Math.max(maxX, o.maxX);
minY = Math.min(minY, o.minY);
maxY = Math.max(maxY, o.maxY);
}
// getters
}
Then you can collect a Stream<Point>
into a PointStatistics
.
class Program {
public static void main(String[] args) {
List<Point> points = new ArrayList<>();
// populate 'points'
PointStatistics statistics = points
.stream()
.collect(PointStatistics::new, PointStatistics::accept, PointStatistics::combine);
}
}
UPDATE
I was completely baffled by the conclusion drawn by OP, so I decided to write JMH benchmarks.
Benchmark settings:
# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# Warmup: 1 iterations, 10 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
# Benchmark mode: Average time, time/op
For each iteration, I was generating a shared list of random Point
s (new Point(random.nextInt(), random.nextInt())
) of size 100K, 1M, 10M.
The results are
100K
Benchmark Mode Cnt Score Error Units
customCollector avgt 10 6.760 ± 0.789 ms/op
forEach avgt 10 0.255 ± 0.033 ms/op
fourStreams avgt 10 5.115 ± 1.149 ms/op
statistics avgt 10 0.887 ± 0.114 ms/op
twoStreams avgt 10 2.869 ± 0.567 ms/op
1M
Benchmark Mode Cnt Score Error Units
customCollector avgt 10 68.117 ± 4.822 ms/op
forEach avgt 10 3.939 ± 0.559 ms/op
fourStreams avgt 10 57.800 ± 4.817 ms/op
statistics avgt 10 9.904 ± 1.048 ms/op
twoStreams avgt 10 32.303 ± 2.498 ms/op
10M
Benchmark Mode Cnt Score Error Units
customCollector avgt 10 714.016 ± 151.558 ms/op
forEach avgt 10 54.334 ± 9.820 ms/op
fourStreams avgt 10 699.599 ± 138.332 ms/op
statistics avgt 10 148.649 ± 26.248 ms/op
twoStreams avgt 10 429.050 ± 72.879 ms/op
You could divide by two the iterations with summaryStatistics()
while keeping a straight code :
IntSummaryStatistics stat = points.stream().mapToInt(point -> point.x).summaryStatistics();
int minX = stat.getMin();
int maxX = stat.getMax();
And do the same thing with point.y
.
You could factor out in this way :
Function<ToIntFunction<Point>, IntSummaryStatistics> statFunction =
intExtractor -> points.stream()
.mapToInt(p -> intExtractor.applyAsInt(pp))
.summaryStatistics();
IntSummaryStatistics statX = statFunction.apply(p -> p.x);
IntSummaryStatistics statY = statFunction.apply(p -> p.y);
A custom collector is a possibility but note that you should implement the combiner part that will make your code harder to read.
So but if you need to use parallel stream you should stay to the imperative way.
While you could improve your actual code by relying the Math.min
and Math.max
functions :
for (Point p : points) {
minX = Math.min(p.x, minX);
minY = Math.min(p.y, minY);
maxY = Math.max(p.x, maxX);
maxY = Math.max(p.y, maxY);
}