Java Streams – How to group by value and find min and max value of each group?
I would like to propose a solution that (in my opinion) strives for greatest readability (which reduces e.g. the maintenance burden of such code).
It's Collector
-based so - as a bonus - it can be used with a parallel Stream
. It assumes the objects are non-null.
final class MinMaxFinder<T> {
private final Comparator<T> comparator;
MinMaxFinder(Comparator<T> comparator) {
this.comparator = comparator;
}
Collector<T, ?, MinMaxResult<T>> collector() {
return Collector.of(
MinMaxAccumulator::new,
MinMaxAccumulator::add,
MinMaxAccumulator::combine,
MinMaxAccumulator::toResult
);
}
private class MinMaxAccumulator {
T min = null;
T max = null;
MinMaxAccumulator() {
}
private boolean isEmpty() {
return min == null;
}
void add(T item) {
if (isEmpty()) {
min = max = item;
} else {
updateMin(item);
updateMax(item);
}
}
MinMaxAccumulator combine(MinMaxAccumulator otherAcc) {
if (isEmpty()) {
return otherAcc;
}
if (!otherAcc.isEmpty()) {
updateMin(otherAcc.min);
updateMax(otherAcc.max);
}
return this;
}
private void updateMin(T item) {
min = BinaryOperator.minBy(comparator).apply(min, item);
}
private void updateMax(T item) {
max = BinaryOperator.maxBy(comparator).apply(max, item);
}
MinMaxResult<T> toResult() {
return new MinMaxResult<>(min, max);
}
}
}
The result-holder value-like class:
public class MinMaxResult<T> {
private final T min;
private final T max;
public MinMaxResult(T min, T max) {
this.min = min;
this.max = max;
}
public T min() {
return min;
}
public T max() {
return max;
}
}
Usage:
MinMaxFinder<Car> minMaxFinder = new MinMaxFinder<>(Comparator.comparing(Car::getPrice));
Map<String, MinMaxResult<Car>> minMaxResultMap = carsDetails.stream()
.collect(Collectors.groupingBy(Car::getMake, minMaxFinder.collector()));
If you were interested in only one Car
per group, you could use, e.g.
Map<String, Car> mostExpensives = carsDetails.stream()
.collect(Collectors.toMap(Car::getMake, Function.identity(),
BinaryOperator.maxBy(Comparator.comparing(Car::getPrice))));
mostExpensives.forEach((make,car) -> System.out.println(make+" "+car));
But since you want the most expensive and the cheapest, you need something like this:
Map<String, List<Car>> mostExpensivesAndCheapest = carsDetails.stream()
.collect(Collectors.toMap(Car::getMake, car -> Arrays.asList(car, car),
(l1,l2) -> Arrays.asList(
(l1.get(0).getPrice()>l2.get(0).getPrice()? l2: l1).get(0),
(l1.get(1).getPrice()<l2.get(1).getPrice()? l2: l1).get(1))));
mostExpensivesAndCheapest.forEach((make,cars) -> System.out.println(make
+" cheapest: "+cars.get(0)+" most expensive: "+cars.get(1)));
This solution bears a bit of inconvenience due to the fact that there is no generic statistics object equivalent to DoubleSummaryStatistics
. If this happens more than once, it’s worth filling the gap with a class like this:
/**
* Like {@code DoubleSummaryStatistics}, {@code IntSummaryStatistics}, and
* {@code LongSummaryStatistics}, but for an arbitrary type {@code T}.
*/
public class SummaryStatistics<T> implements Consumer<T> {
/**
* Collect to a {@code SummaryStatistics} for natural order.
*/
public static <T extends Comparable<? super T>> Collector<T,?,SummaryStatistics<T>>
statistics() {
return statistics(Comparator.<T>naturalOrder());
}
/**
* Collect to a {@code SummaryStatistics} using the specified comparator.
*/
public static <T> Collector<T,?,SummaryStatistics<T>>
statistics(Comparator<T> comparator) {
Objects.requireNonNull(comparator);
return Collector.of(() -> new SummaryStatistics<>(comparator),
SummaryStatistics::accept, SummaryStatistics::merge);
}
private final Comparator<T> c;
private T min, max;
private long count;
public SummaryStatistics(Comparator<T> comparator) {
c = Objects.requireNonNull(comparator);
}
public void accept(T t) {
if(count == 0) {
count = 1;
min = t;
max = t;
}
else {
if(c.compare(min, t) > 0) min = t;
if(c.compare(max, t) < 0) max = t;
count++;
}
}
public SummaryStatistics<T> merge(SummaryStatistics<T> s) {
if(s.count > 0) {
if(count == 0) {
count = s.count;
min = s.min;
max = s.max;
}
else {
if(c.compare(min, s.min) > 0) min = s.min;
if(c.compare(max, s.max) < 0) max = s.max;
count += s.count;
}
}
return this;
}
public long getCount() {
return count;
}
public T getMin() {
return min;
}
public T getMax() {
return max;
}
@Override
public String toString() {
return count == 0? "empty": (count+" elements between "+min+" and "+max);
}
}
After adding this to your code base, you may use it like
Map<String, SummaryStatistics<Car>> mostExpensives = carsDetails.stream()
.collect(Collectors.groupingBy(Car::getMake,
SummaryStatistics.statistics(Comparator.comparing(Car::getPrice))));
mostExpensives.forEach((make,cars) -> System.out.println(make+": "+cars));
If getPrice
returns double
, it may be more efficient to use Comparator.comparingDouble(Car::getPrice)
instead of Comparator.comparing(Car::getPrice)
.
Here is a very concise solution. It collects all Car
s into a SortedSet
and thus works without any additional classes.
Map<String, SortedSet<Car>> grouped = carDetails.stream()
.collect(groupingBy(Car::getMake, toCollection(
() -> new TreeSet<>(comparingDouble(Car::getPrice)))));
grouped.forEach((make, cars) -> System.out.println(make
+ " cheapest: " + cars.first()
+ " most expensive: " + cars.last()));
A possible downside is performance, as all Car
s are collected, not just the current min and max. But unless the data set is very large, I don't think it will be noticeable.