How to get control of Dataset display?
(Extended comment, not an answer...)
Only if I turn t to "t" and r to "r" to run [...]
Yes, I have noticed that too: if the column names are not strings with Dataset
I get the non-tabular, hierarchical form. This means that often I have to take that into account in the data wrangling examples I provide. (Like those in WFR's
CrossTabulate
.)
How to make Mathematica show complete Dataset?
In the particular example you provided calling Transpose
twice works (in V 12.)
Update
Double transpose work for my example in post. But it doesn't work for <|"101" -> <|t -> 42, r -> 7.5|>, "102" -> <|t -> 42, r -> 7.5|>, "103" -> <|t -> 42, r -> 7.5, s -> 9|>|>, in which I changed first level key to string form
I use a few packages I wrote to do the data wrangling within the typical workflows I tend to utilize. Functions from those packages can be used to display the data in question with a tabular/matrix form that is similar to the requested dataset form.
Here the packages are loaded:
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/DataReshape.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/CrossTabulate.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/SSparseMatrix.m"]
Here is the data from the comment quoted above:
records = <|"101" -> <|t -> 42, r -> 7.5|>,
"102" -> <|t -> 42, r -> 7.5|>,
"103" -> <|t -> 42, r -> 7.5, s -> 9|>|>;
Here we get a tabular/matrix form of using a special object that is a "sparse matrix with named rows and columns":
MatrixForm[ToSSparseMatrix[CrossTabulate[RecordsToLongForm[records]]]]
In case it is of interest to see the intermediate steps:
I spend some time to develop a function my2DGrid
that generate a Grid
with hierarchia row and col labels and some controls.
We need to create specific form of raw data like
data = Association@Flatten[
Table[{x, y, z, u} -> {x, y, z, u}, {x, {"A", "B"}}, {y, {"C",
"D"}}, {z, {"E", "F"}}, {u, {"G", "H"}}]]
where you can view the keys as some kind of "general coordinate".
And using my2DGrid
we can get different form of grid output. For example:
my2DGrid[data, {1, 2}, {3, 4}]
gives
my2DGrid[data, {1, 2, 3}, {4}]
gives
my2DGrid[data, {1,2,3,4},{}]
Besides, for association in my post, we need a function to flatten the association first. Fortunately, Wolfram Function Repository just provides such a function called AssociationKeyFlatten. So
tmpAssoc =
ResourceFunction["AssociationKeyFlatten"]@<|
101 -> <|t -> 42, r -> 7.5`|>, 102 -> <|t -> 42, r -> 7.5`|>,
103 -> <|t -> 42, r -> 7.5`, s -> 9|>|>
tmpAssoc
now holds
<|{101, t} -> 42, {101, r} -> 7.5, {102, t} -> 42, {102, r} ->
7.5, {103, t} -> 42, {103, r} -> 7.5, {103, s} -> 9|>
then
my2DGrid[tmpAssoc, {1}, {2}]
gives
and
my2DGrid[tmpAssoc, {1, 2}, {}]
gives
Here is code
ClearAll[genTag1DForGrid];
genTag1DForGrid[tagList_, type_] := Module[{spanKeyword, splitRes},
spanKeyword =
Switch[type, "row", SpanFromAbove, "col", SpanFromLeft];
splitRes = Split[tagList];
Flatten[
If[Length@# > 1,
ReplacePart[
ConstantArray[spanKeyword, Length@#], {1 -> First@#}], #] & /@
splitRes]
];
ClearAll[genTag2DForGrid];
genTag2DForGrid[tagMat_, type_] := Module[{tmp},
tmp = genTag1DForGrid[#, type] & /@ Transpose[tagMat];
Switch[type,
"row",
If[Length@tmp == 0, {}, Transpose[tmp]],
"col",
tmp]
];
ClearAll[genCornerForGrid];
genCornerForGrid[expr_, rowLen_, colLen_] := Module[{tmp},
tmp = ConstantArray[SpanFromBoth, {rowLen, colLen}];
tmp[[1, 1]] = expr;
If[colLen > 1, tmp[[1, 2 ;;]] = SpanFromLeft];
If[rowLen > 1, tmp[[2 ;;, 1]] = SpanFromAbove];
tmp];
ClearAll[putGridElementTogether]
putGridElementTogether[gridData_, rowTagForGrid_, colTagForGrid_] :=
Module[{corner},
corner =
genCornerForGrid[Null, Dimensions[colTagForGrid][[1]],
Dimensions[rowTagForGrid][[2]]];
ArrayFlatten[{{corner, colTagForGrid}, {rowTagForGrid, gridData}}]
];
ClearAll[my2DGrid];
my2DGrid[rawDataAssoc_, rowTagSlotGroup_, colTagSlotGroup_] := Module[
{rawDataAssocKeySorted, keys, rowTagPosIndex, colTagPosIndex,
arrayRules, gridData, rowTagForGrid, colTagForGrid, finalGridData},
rawDataAssocKeySorted = KeySort[rawDataAssoc];
keys = Keys@rawDataAssocKeySorted;
rowTagPosIndex =
First /@ (PositionIndex@
DeleteDuplicates@keys[[;; , rowTagSlotGroup]]);
colTagPosIndex =
First /@ (PositionIndex@
DeleteDuplicates@keys[[;; , colTagSlotGroup]]);
arrayRules =
KeyMap[{#[[rowTagSlotGroup]] /.
rowTagPosIndex, #[[colTagSlotGroup]] /. colTagPosIndex} &,
rawDataAssocKeySorted];
gridData =
ReplacePart[
ConstantArray[Null, CoordinateBounds[Keys@arrayRules][[;; , 2]]],
arrayRules];
rowTagForGrid = genTag2DForGrid[Keys@rowTagPosIndex, "row"];
colTagForGrid = genTag2DForGrid[Keys@colTagPosIndex, "col"];
finalGridData =
If[Length@colTagForGrid === 0 && Length@rowTagForGrid =!= 0,
Join[rowTagForGrid, gridData, 2],
If[Length@rowTagForGrid === 0 && Length@colTagForGrid =!= 0,
Join[colTagForGrid, gridData],
putGridElementTogether[gridData, rowTagForGrid, colTagForGrid]
]];
Grid[finalGridData,
Alignment -> {Center, Center},
Frame -> All,
Background -> {
If[rowTagForGrid === {}, {},
ConstantArray[Lighter@Lighter@Blue,
Dimensions[rowTagForGrid][[2]]]],
If[colTagForGrid === {}, {},
ConstantArray[Lighter@Yellow, Dimensions[colTagForGrid][[1]]]]}]
]
Introduction
Initially I got confused and conflated the two questions, which are quite distinct. Here are my interpretations of them:
Can we select which keys are displayed in the rows and in the columns of a dataset?
- Yes, but we have to construct the
TypeSystem`
type
structuretype
inDataset[data, type, metadata]
manually. Normally thetype
is constructed withTypeSystem`DeduceType
, which uses heuristics to guess an appropriate structure for displaying the dataset.
- Yes, but we have to construct the
Can we control how much of the dataset is displayed? (Originally I answered only this question.)
- To a certain extent, yes. There is an internal function that computes a limit on the number of rows of each
Association
that are displayed and there is an accessible variable that gives a "target" for the overall limit on the total number of rows displayed. We can display all entries of the associations, which is what is desired in the OP. I'm not sure we can customize it for each association or each level in the dataset.
- To a certain extent, yes. There is an internal function that computes a limit on the number of rows of each
The methods below for Q1 and Q2 may be combined to show a complete table with the desired rows and columns.
Since the each of the two questions in this Q&A are somewhat involved, I've divided my answer into two sections, plus a code dump at the end for the code for the answer to Q1.
Answer to Q1: A "table form" for Dataset[]
The (long) code is given at the end. The main strategy comes from the observation that when the type
structure in Dataset[assoc, type, metadata]
has one or more levels of TypeSystem`Assoc[]
wrapped around one or more levels of TypeSystem`Struct[]
, the keys in the levels corresponding to Assoc[]
appear in the rows and those corresponding to Struct[]
appear in the columns. The only way I found to get all the keys in the columns is to use Dataset[{assoc}, TypeSystem`Vector[newtype, 1], metadata]
, where newtype
consists of nested TypeSystem`Struct[]
corresponding to the structure of assoc
. The construction of the type
structure is handled by structify[ds, nrows]
.
The basic strategy is (1) to transpose (permute) the levels of the dataset so that the top levels correspond to the rows (in order) and deeper levels correspond to the columns (in order), and then (2) to construct the type
as above.
One problem I ran into was when keys were missing. For my purposes, I just filled them in with "a" -> Missing[]
at the lowest level and "a" -> <||>
at the higher levels, where "a"
is the missing key. They are filled in in descending order, so that the missing keys in the empty association in "a" -> <||>
are filled in at the next step. The function that does this is padMissing[ds]
, which uses getKeys[ds, level]
to get all the keys at a given level
of the dataset ds
. (This is one of those places I feel where there should be an easier solution.)
Another problem is that Struct[]
requires keys to be strings. The utility toStringKeys[ds]
will convert keys to strings. I don't think there is currently any way around this, if you want to use TypeSystem`
to solve this problem.
Examples
The syntax is dsTableForm[ds, rows, columns]
, where rows
is a list of levels to appear in the rows of the table (in that order) and columns
likewise is a list of levels to appear in the columns. Together, Join[rows, columns]
should be a permutation of {1,..., n}
where n
is the number of levels of the dataset ds
.
OP's troublesome example.
dsOP = Dataset@<|
101 -> <|t -> 42, r -> 7.5`|>,
102 -> <|t -> 42, r -> 7.5`|>,
103 -> <|t -> 42, r -> 7.5`, s -> 9|>|>
dsOP = toStringKeys@dsOP
dsTableForm[dsOP, {2}, {1}]
dsTableForm[dsOP, {1, 2}, {}]
dsTableForm[dsOP, {}, {2, 1}]
A more complicated example.
dset = Dataset@
<|"11" -> <|
"21" -> <|
"31" -> <|"43" -> 27, "44" -> 28|>,
"32" -> <|"42" -> 24, "43" -> 25, "44" -> 26|>,
"33" -> <|"41" -> 22, "42" -> 21, "43" -> 23, "44" -> 20|>
|>,
"22" -> <|
"31" -> <|"43" -> 17, "42" -> 18, "44" -> 19|>,
"32" -> <|"43" -> 14, "44" -> 15, "42" -> 16|>
|>
|>,
"12" -> <|
"21" -> <|
"31" -> <|"41" -> 11, "43" -> 10, "44" -> 9|>,
"32" -> <|"41" -> 12, "44" -> 13|>|>,
"22" -> <|
"31" -> <|"41" -> 1, "42" -> 3, "43" -> 2, "44" -> 4|>,
"33" -> <|"41" -> 6, "42" -> 5, "43" -> 7, "44" -> 8|>|>|>
|>;
dsTableForm[dset, {2, 3}, {1, 4}]
Answer to Q2: Showing all the dataset (original answer)
To answer the simple question the in OP,
"How to make Mathematica show complete Dataset?",
there is a simple way by overriding an internal function,
TypeSystem`NestedGrid`PackagePrivate`meanLength
.
This function is passed a list of randomly sampled lengths of the associations being displayed and selects as the length to be displayed the 80th percentile. When all the associations are short but of varied lengths, the limit on the length may seem ridiculously small.
One way to display all the keys is to set meanLength
to return a value greater than or equal to the longest association in the dataset. The function maxKeys
computes that number. Note that meanLength
is used to construct the box form for the display of the dataset, so it must be defined to return the desired length during the construction of the boxes.
myds = Dataset@<|101 -> <|t -> 42, r -> 7.5`|>,
102 -> <|t -> 42, r -> 7.5`|>,
103 -> <|t -> 42, r -> 7.5`, s -> 9|>|>;
ClearAll[maxKeys];
maxKeys[assoc_Association] := Max[Length@Keys[assoc], maxKeys /@ assoc];
maxKeys[ds_Dataset] := maxKeys[Normal@ds];
maxKeys[{assoc__Association}] := Max[maxKeys /@ {assoc}]; (* TypeSystem`Vector *)
maxKeys[x_] := 0;
Block[{
TypeSystem`NestedGrid`PackagePrivate`meanLength =
Evaluate@maxKeys[myds] &},
RawBoxes@MakeBoxes[#, StandardForm] &@myds
]
Alternatively we could define TypeSystem`NestedGrid`PackagePrivate`meanLength = 3 &
if we already know the number of items we want to display is at most 3.
Aside: Another parameter that controls the display is Dataset`$DatasetTargetRowCount
, which controls the overall number of rows to be displayed. In a small dataset like the example, this does not matter.
Code for dsTableForm
This uses code for transposing a dataset from How to make arbitrary transpositions of associations and datasets. (By coincidence, I was trying to do what the OP wants to do at the same time, a week ago or so.)
I'm fairly new to using Dataset
, and sometimes I may have a roundabout way to do something for which there is a direct route.
(* adjacentCycles[]
* factors permutations into cycles of the form (n n+1)
*)
adjacentCycles[p_?PermutationListQ] :=
Flatten@iAdjacentCycles[PermutationCycles[p]];
adjacentCycles[c : Cycles[{{__Integer} ..}]] :=
Flatten@iAdjacentCycles[c];
iAdjacentCycles[Cycles[c : {}]] := {};
iAdjacentCycles[Cycles[c : {c1_, c2__}]] :=(*Join@@*)
iAdjacentCycles /@ Cycles@*List /@ c;
iAdjacentCycles[Cycles[{c : {x_, y_, z__}}]] :=(*Join@@*)
iAdjacentCycles /@ Cycles@*List /@ Reverse@Partition[c, 2, 1];
iAdjacentCycles[Cycles[{c : {x_, y_}}]] := Module[{a, b},
{a, b} = MinMax[{x, y}];
With[{factors =
Cycles@*List /@ Reverse@Partition[Range[a, b], 2, 1]},
Reverse@Rest[factors]~Join~factors]
];
(* dsTranspose[]
* permutes the levels of a Dataset or nested Association
* in the same way as Transpose[]
*)
ClearAll[dsTranspose];
dsTranspose[assoc_Association, perm_?PermutationListQ] :=
With[{res = dsTranspose[Dataset@assoc, perm]},
Normal@res /; Dataset`ValidDatasetQ[res]
];
dsTranspose[ds_Dataset, perm_?PermutationListQ] :=
Module[{
xps, (* transpositions *)
xpFN,
res},
xps = adjacentCycles@perm;
xps = xps[[All, 1, 1, 1]] - 1; (* levels to be transposed *)
xpFN[0] = Transpose;
xpFN[n_Integer?Positive] :=
Map[Check[Query[Transpose][#],
Throw[$Failed, dsTranspose]] &, #, {n}] &;
res = Catch[Fold[xpFN[#2][#1] &, ds, xps], dsTranspose];
res /; Dataset`ValidDatasetQ[res]
];
ClearAll[dsTableForm,
structify, associationDepth, getKeys, toStringKeys, padMissing,
missingToNullAssoc];
(* structify[]
* makes the first n association keys row labels and ones
* the rest column ones
* TypeSystem`Struct[..] yields columns when inside
* TypeSystem`Assoc[..] which yield rows, and
* TypeSystem`Vector[..] is a workaround to get all column indices
*)
structify[x_] := structify[x, 0];
structify[{a_Association}, _] := TypeSystem`Vector[structify@a, 1];
structify[a_Association, 0] := TypeSystem`Struct[Keys@a,
structify /@ Values@a];
structify[data_, 0] := TypeSystem`DeduceType[data];
structify[a_Association, n_Integer?Positive] :=
TypeSystem`Assoc[TypeSystem`Atom[String],
structify[a[[1]], n - 1], Length@Keys@a];
(* like ArrayDepth[] but for nested Association[] *)
associationDepth[a_Association] :=
Min[1 + Values[associationDepth /@ a]];
associationDepth[ds_Dataset] /; ! TrueQ[associationDepth$In] :=
Block[{associationDepth$In = True}, associationDepth[Normal@ds]];
associationDepth[x_] := 0;
(* get all the keys at a given level of the Dataset *)
getKeys[ds_Dataset, level_] := getKeys[Normal@ds, level];
getKeys[a_Association, level_] :=
Union @@ Reap[Map[Sow[Keys[#], getKeys] &, a, {level - 1}],
getKeys][[2, 1]];
(* convert keys to strings *)
toStringKeys[ds_Dataset] :=
ds @@ Table[KeyMap[ToString], {associationDepth[ds]}];
(* padMissing[]
* is like PadRight[] but for nested Association[]
* - Apply Query[keys] to each level missing entries are
* converted to Associations at all but the lowest level
* - Must apply successively in the form
* ds[Query[keys]]
* ds[All,Query[keys]]
* ...
* ds[All,...,All,Query[keys]]
* - In Query[keys], keys should be a list of strings,
* of the form {"a", "b",...},
* not Keys (that is, not literally Query[Keys])
*)
missingToNullAssoc[keys_] :=
AssociationMap[
Function[{k}, (Replace[#[k], _Missing :> Association[]])], keys] &;
padMissing[ds_Dataset, level_: Infinity] :=
With[{depth = Min[associationDepth[ds], level] - 1},
Fold[ (* apply Query[keys] to each level *)
#1 @@ #2 &
, ds
, Append[ (* ds[Query[keys]],..., ds[All,...,All,Query[keys]] *)
PadLeft[{Query@getKeys[ds, depth + 1]}, depth + 1, All]
]@Table[
PadLeft[ (* can't specify an arbitrary MissingBehavior? *)
{Query[#] /* missingToNullAssoc[#] &@getKeys[ds, k]}
, k
, All],
{k, depth}]
]
];
(* dsTableForm[]
* transposes and structures a Dataset into rows and columns
* It must have string keys
* Specifying no rows {} adds a level, List[a], to
* the Dataset returned
*)
dsTableForm[ds_, rows : {___Integer?Positive},
cols : {___Integer?Positive}] :=
With[{newDS = If[Length@rows == 0, List, Identity]@
Normal@dsTranspose[
padMissing@ds,
InversePermutation@Flatten[{rows, cols}]]},
Dataset[
newDS,
structify[newDS, Length@rows],
<| (* metadata sometimes required even if empty <||> *)
"Origin" -> (HoldComplete[dsTableForm, #] &@
Dataset`ToDatasetHandle[ds]),
"ID" -> Dataset`GenerateUniqueID[]|>
] /;
AssociationQ[newDS] || VectorQ[newDS, AssociationQ]
]