Function like KeyMap that combines values in case of key collisions
keyCombineBy[assoc_?AssociationQ, by_, post_] := GroupBy[
Normal@assoc, by@*First -> Last, post
]
keyCombineBy[<|{a, 1} -> 1, {a, 2} -> 2, {b, 1} -> 3|>, First, ff]
<|a -> ff[{1, 2}], b -> ff[{3}]|>
This minimal modification makes it slightly faster:
keyCombineBy3[assoc_?AssociationQ, by_, post_] := GroupBy[
Normal@assoc, by@*First , post @* Values
]
Or, alternatively (slow):
keyCombineBy2[assoc_?AssociationQ, by_, post_
] := Merge[post] @ KeyValueMap[by@#1 -> #2 &] @ assoc
You may use Query
to improve performance.
ClearAll[keyCombine];
keyCombine[fun_, asc_?AssociationQ, comb_ : Identity] :=
asc //
Query[Normal /* GroupBy[fun@*Keys]] //
Query[All, (Values@# &) /* comb]
Here Values@# &
is used instead of Values
to circumvent the syntax sugar that works against our intention in this case. Since we have a list of rules under each Key
from the first Query
then [All, Values]
maps all the way down to the values of the list of keys. I think this is syntax sugar but it might be a bug (Ideas?). In any case rolling our own pure function escapes this and lets us place Values
on the list of rules.
With
SeedRandom[42];
aa = AssociationThread[RandomInteger[1000000, 100000], RandomInteger[1000000, 100000]];
Then
keyCombine[Mod[#, 5] &, aa, Total] // AbsoluteTiming
{0.158329, <|4 -> 9451454209, 2 -> 9485726007, 3 -> 9480421781, 0 -> 9443541021, 1 -> 9545354067|>}
Or, alternatively
ClearAll[keyCombine2];
keyCombine2[fun_, asc_?AssociationQ, comb_ : Identity] :=
Query[Normal /* GroupBy[fun@*Keys] /* Map[comb@*Values]]@asc
This replaces both the second Query
and the pure Values
function with Map
in the first Query
.
Then
keyCombine2[Mod[#, 5] &, aa, Total] // AbsoluteTiming
{0.16708, <|4 -> 9451454209, 2 -> 9485726007, 3 -> 9480421781, 0 -> 9443541021, 1 -> 9545354067|>}
This is ever so slightly slower than keyCombine
above but some may find it easier to read.
Hope this helps.