Reshaping associations, generalization of GroupBy

One approach is to employ a helper function that unwraps singleton lists:

{delist[v_]} ^:= v

With this, the GroupBy expression is fairly succinct:

dataset // GroupBy[{#type&, #subtype& -> delist}]

(*
  <| "a" -> <| "I" -> <|"type" -> "a", "subtype" -> "I", "value" -> 1|>
             , "II" -> <|"type" -> "a", "subtype" -> "II", "value" -> 2|>
             |>
   , "b" -> <| "I" -> <|"type" -> "b", "subtype" -> "I", "value" -> 1|>
             , "II" -> <|"type" -> "b", "subtype" -> "II", "value" -> 2|>
             |>
   |>
*)

This generalizes to deeper nesting:

dataset // GroupBy[{#type&, #subtype&, #type&, #subtype& -> delist}]

(*
  <| "a" ->
      <| "I" -> <|"a" -> <|"I" -> <|"type" -> "a", "subtype" -> "I", "value" -> 1|>|>|>
       , "II" -> <|"a" -> <|"II" -> <|"type" -> "a", "subtype" -> "II", "value" -> 2|>|>|>
       |>
   , "b" ->
       <| "I" -> <|"b" -> <|"I" -> <|"type" -> "b", "subtype" -> "I", "value" -> 1|>|>|>
        , "II" -> <|"b" -> <|"II" -> <|"type" -> "b", "subtype" -> "II", "value" -> 2|>|>|>
        |>
   |>
*)

Instead of a nested association solution, would Query and Select be acceptable.

Query[Select[#type == "a" && #subtype == "I" &], "value"]@dataset

(* {1} *)

This form is more descriptive on what is happening and does not require reshaping of the list of associations.

If your data is such that there is only ever one item intersecting a particular "type" and "subtype" then tack on First.

First@Query[Select[#type == "a" && #subtype == "I" &], "value"]@
  dataset

(* 1 *)

Hope this helps.

Extension

You can extend this to a more general case in which you parametrise the filter by both key and value.

filterBy[filter_] := Function[Evaluate[And @@ ReleaseHold[Hold[Slot][First@#] == Last@# & /@ filter]]]

then with

target = {{"type", "a"}, {"subtype", "I"}};

Query[SelectFirst[filterBy[target]], "value"]@dataset

(* 1 *)

I have posted code doing a very similar thing here - the functions pushUp and pushUpNested. That code was more general, since there I provided a declarative interface to group by values or their parts. To do what you need, I'll redefine slightly (assuming you run that code):

ClearAll[pushUpNested];
pushUpNested[{}, elemF_: Identity] := elemF;
pushUpNested[specs : {_List ..}, elemF_: Identity ] := 
   Composition[
     Map[pushUpNested[Rest[specs], elemF]], 
     pushUp@First[specs]
   ];

Now we create a transform:

transform = pushUpNested[{{"type"}, {"subtype"}}, First]


(* 
   Map[Map[First]@*GroupBy[#1[[Sequence[Key["subtype"]]]] &]]@*
   GroupBy[#1[[Sequence[Key["type"]]]] &]
*)

which we can now apply to get the nested structure:

nested = transform@dataset

(*

   <|
     "a" -> <|
       "I" -> <|"type" -> "a", "subtype" -> "I", "value" -> 1|>, 
       "II" -> <|"type" -> "a", "subtype" -> "II", "value" -> 2|>
     |>, 
     "b" -> <|
       "I" -> <|"type" -> "b", "subtype" -> "I", "value" -> 1|>, 
       "II" -> <|"type" -> "b", "subtype" -> "II", "value" -> 2|>
     |>
   |>

*)

The advantage of using pushUpNested is that it makes it very easy and declarative to construct such transforms, and the transform is available for inspection as a stand-alone fully-prepared function.

Reshaping associations, generalization of GroupBy

Tags:

Associations

Functions

Gathering

Related

Recent Posts