Dataset collapsing/reducing
To get a list of C values for each A/B combination:
ds[GroupBy[{#A, #B}&] /* Values
, <| "A" -> Query[First, "A"]
, "B" -> Query[First, "B"]
, "CList" -> Query[All, "C"]
|>
]
This is not as succinct as SQL's GROUP BY operator, but it does allow us to easily perform multiple aggregations if desired:
ds[GroupBy[{#A, #B}&] /* Values
, <| "A" -> Query[First, "A"]
, "B" -> Query[First, "B"]
, "CList" -> Query[All, "C"]
, "CMean" -> Query[Mean, "C"]
, "CMin" -> Query[Min, "C"]
, "CMax" -> Query[Max, "C"]
|>
]
Failed to find a duplicate:
by = {"A", "B"};
Values @ GroupBy[ds, Query[by], MapAt[First, List /@ by] @* Merge[Identity]]
or:
ds // GroupBy[Query[{"A", "B"}] -> (#C &)] // KeyValueMap[<|#, "Clist" -> #2|> &]
ds // GroupBy[Query[{"A", "B"}]] // KeyValueMap[<|#, "Clist" -> #2[[;; , "C"]]|> &]
You may use GroupBy
and Merge
.
ds[GroupBy[{#"A", #"B"} &] /* Values,
Merge[Identity] /* Query[{"A" -> First, "B" -> First}]]
Hope this helps