Java Streams GroupingBy and filtering by count (similar to SQL's HAVING)
The operation has to be performed after the grouping in general, as you need to fully collect a group before you can determine whether it fulfills the criteria.
Instead of collecting a map into another, similar map, you can use removeIf
to remove non-matching groups from the result map and inject this finishing operation into the collector:
Map<KeyType, List<ElementType>> result =
input.stream()
.collect(collectingAndThen(groupingBy(x -> x.id(), HashMap::new, toList()),
m -> {
m.values().removeIf(l -> l.size() <= 5);
return m;
}));
Since the groupingBy(Function)
collector makes no guarantees regarding the mutability of the created map, we need to specify a supplier for a mutable map, which requires us to be explicit about the downstream collector, as there is no overloaded groupingBy
for specifying only function and map supplier.
If this is a recurring task, we can make a custom collector improving the code using it:
public static <T,K,V> Collector<T,?,Map<K,V>> having(
Collector<T,?,? extends Map<K,V>> c, BiPredicate<K,V> p) {
return collectingAndThen(c, in -> {
Map<K,V> m = in;
if(!(m instanceof HashMap)) m = new HashMap<>(m);
m.entrySet().removeIf(e -> !p.test(e.getKey(), e.getValue()));
return m;
});
}
For higher flexibility, this collector allows an arbitrary map producing collector but since this does not enforce a map type, it will enforce a mutable map afterwards, by simply using the copy constructor. In practice, this won’t happen, as the default is to use a HashMap
. It also works when the caller explicitly requests a LinkedHashMap
to maintain the order. We could even support more cases by changing the line to
if(!(m instanceof HashMap || m instanceof TreeMap
|| m instanceof EnumMap || m instanceof ConcurrentMap)) {
m = new HashMap<>(m);
}
Unfortunately, there is no standard way to determine whether a map is mutable.
The custom collector can now be used nicely as
Map<KeyType, List<ElementType>> result =
input.stream()
.collect(having(groupingBy(x -> x.id()), (key,list) -> list.size() > 5));
The only way I am aware of is to use Collectors.collectingAndThen
with the same implementation inside the finisher
function:
Map<Integer, List<Item>> a = input.stream().collect(Collectors.collectingAndThen(
Collectors.groupingBy(Item::id),
map -> map.entrySet().stream()
.filter(e -> e.getValue().size() > 5)
.collect(Collectors.toMap(Entry::getKey, Entry::getValue))));