Is there a way to `Merge` with different functions across Keys?
Here are two ways:
data = AssociationThread[{"a", "b", "c"}, #] & /@ Tuples[{1, 2}, {3}]
(* {<|"a" -> 1, "b" -> 1, "c" -> 1|>, <|"a" -> 1, "b" -> 1, "c" -> 2|>,
<|"a" -> 1, "b" -> 2, "c" -> 1|>, <|"a" -> 1, "b" -> 2, "c" -> 2|>,
<|"a" -> 2, "b" -> 1, "c" -> 1|>, <|"a" -> 2, "b" -> 1, "c" -> 2|>,
<|"a" -> 2, "b" -> 2, "c" -> 1|>, <|"a" -> 2, "b" -> 2, "c" -> 2|>} *)
GroupBy[data, First -> Rest, Merge[Apply@Construct]@*Prepend[<|"b" -> b, "c" -> c|>]]
(* <|1 -> <|"b" -> b[1, 1, 2, 2], "c" -> c[1, 2, 1, 2]|>,
2 -> <|"b" -> b[1, 1, 2, 2], "c" -> c[1, 2, 1, 2]|>|> *)
GroupBy[data, First -> Rest, Query[{"b" -> b, "c" -> c}]@*Merge[Identity]]
(* <|1 -> <|"b" -> b[{1, 1, 2, 2}], "c" -> c[{1, 2, 1, 2}]|>,
2 -> <|"b" -> b[{1, 1, 2, 2}], "c" -> c[{1, 2, 1, 2}]|>|> *)
The first one prepends an additional association to the associations to be merged. It then uses Construct
to apply the function from the first association to the elements of the other associations.
The second approach merges the associations with Identity
, and uses Query
to apply the appropriate post-processing to each entry. In my opinion, this one is more readable. It also has the advantage that it doesn't break if the associations have keys not present in your "reduction list":
GroupBy[data, First -> Rest, Merge[Apply@Construct]@*Prepend[<|"b" -> b|>]]
(* <|1 -> <|"b" -> b[1, 1, 2, 2], "c" -> 1[2, 1, 2]|>,
2 -> <|"b" -> b[1, 1, 2, 2], "c" -> 1[2, 1, 2]|>|> *)
GroupBy[data, First -> Rest, Query[{"b" -> b}]@*Merge[Identity]]
(* <|1 -> <|"b" -> b[{1, 1, 2, 2}], "c" -> {1, 2, 1, 2}|>,
2 -> <|"b" -> b[{1, 1, 2, 2}], "c" -> {1, 2, 1, 2}|>|> *)
Notice how the first output contains expressions like 1[2, 1, 2]
(since we didn't supply a function in the first association, Construct
simply took the element from the first "proper" association for the head - in this case a 1
). The second output simply contains {1, 2, 1, 2}
in that place, since the output from Merge
has just been left untouched.
I created a Wolfram repository function specifically for this. It allows you to define different functions for merging different keys. For example, you can merge numerical and categorical columns with different functions in the Titanic dataset:
data = Normal @ ExampleData[{"Dataset", "Titanic"}];
ResourceFunction["MergeByKey"][data,
{"age" -> Histogram},
BarChart[Counts[#], ChartLabels -> Automatic] &
]
MergeByKey
is based on AssociationTranspose
(from the GeneralUtilities
package) and Query
. It's also worth noting that it's significantly faster on large rectangular datasets like the one from the example:
merge1 = ResourceFunction["MergeByKey"][data, {}]; // RepeatedTiming
merge2 = Merge[data, Identity]; // RepeatedTiming
merge1 === merge2
{0.0054, Null}
{0.022, Null}
True