Mathematica 10 Dataset doesn't format more than 4 columns?
Theoretically, Dataset supports any number of columns.
The behavior you are seeing is actually because the type deduction that Dataset is doing behind the scenes isn't perfect (and indeed in some sense cannot be perfect). Your synthetic example is such that your second list of associations is "most consistent" with a particular type that doesn't typeset as a table.
You can see what type Dataset deduced in a given case by using Dataset`GetType
. First get TypeSystem
onto your context path, so that the types aren't fully qualified and are easier to read:
Needs["TypeSystem`"];
Then use GetType
:
In[2]:= Dataset`GetType @ Dataset @ Table[Association[ToString[#] -> # & /@ Range[4]], {2}]
Out[2]= Vector[Struct[{"1", "2", "3", "4"},
{Atom[Integer], Atom[Integer], Atom[Integer], Atom[Integer]}], 2]
Notice that the type of your data has been deduced to be a Vector
(homogenous list) of Structs
s (heterogenous associations), or in other words a row-oriented table.
But now do:
In[3]:= Dataset`GetType @ Dataset @ Table[Association[ToString[#] -> # & /@ Range[5]], {2}]
Out[3]= Vector[Assoc[Atom[String], Atom[Integer], 5], 2]
Here, your data has been deduced as a Vector
of Assoc
s (homogenous associations). Assocs
are a type that doesn't care what keys are present, just that they all have the same type, and also that the values have the same type.
That happened because according to the internal heuristics, an Assoc
is considered to be a more parsimonious type as soon as we cross the threshold of 4 fields. But this would not be true if we looked at an association whose values were different types, instead of all being integers:
In[2]:= DeduceType @ Table[<|"A" -> 1, "B" -> 2, "C" -> 3, "D" -> 4, "E" -> "bar"|>, {5}]
Out[2]= Vector[Struct[{"A", "B", "C", "D", "E"},
{Atom[Integer], Atom[Integer], Atom[Integer], Atom[Integer], Atom[String]}], 5]
The only consistent type here is a Vector
of Structs
(notice I'm using DeduceType
directly, which is what Dataset uses upon construction). And indeed, this more complex Dataset typesets as a table, owing to the inner Struct
type:
Although it isn't documented and is therefore of course subject to change, you can force a specific type to be used by supplying a second argument to Dataset:
Dataset[
Table[Association[ToString[#] -> # & /@ Range[5]], {5}],
Vector[Struct[{"1", "2", "3", "4", "5"},
{Atom[Integer], Atom[Integer], Atom[Integer], Atom[Integer], Atom[Integer]}]]]
This will typeset as a table, as you desire:
If you look at InputForm
s of produced Dataset
s you will see significant difference:
Dataset@Table[Association[ToString[#] -> # & /@ Range[4]], {2}] // InputForm
Dataset[{<|"1" -> 1, "2" -> 2, "3" -> 3, "4" -> 4|>, <|"1" -> 1, "2" -> 2, "3" -> 3, "4" -> 4|>}, TypeSystem`Vector[TypeSystem`Struct[{"1", "2", "3", "4"}, {TypeSystem`Atom[Integer], TypeSystem`Atom[Integer], TypeSystem`Atom[Integer], TypeSystem`Atom[Integer]}], 2], <|"ID" -> 34613158063874|>]
Dataset@Table[Association[ToString[#] -> # & /@ Range[5]], {2}] // InputForm
Dataset[{<|"1" -> 1, "2" -> 2, "3" -> 3, "4" -> 4, "5" -> 5|>, <|"1" -> 1, "2" -> 2, "3" -> 3, "4" -> 4, "5" -> 5|>}, TypeSystem`Vector[TypeSystem`Assoc[TypeSystem`Atom[String], TypeSystem`Atom[Integer], 5], 2], <|"ID" -> 93437030149901|>]
Let us try to reproduce the first structure with 5 columns:
Dataset[Table[Association[ToString[#] -> # & /@ Range[5]], {2}],
TypeSystem`Vector[
TypeSystem`Struct[ToString /@ Range[5],
Table[TypeSystem`Atom[Integer], {5}]], 2]]
Voilà! I have checked this method with 200 columns and it works!
I think that ideally there would be a separate function for displaying a dataset in the preferred way. Automatic formatting should only produce a summary, i.e. truncate the data if it is too large. But we should have the option to display the complete dataset and tweak the way it is displayed, if necessary.
This is an attempt at writing a function that will take a dataset having the form of a 2D table and will try to format it in a reasonable way. It doesn't aim to be general and will probably fail on more complicated datasets. The way I'm trying to determine the type (schema) is probably clumsy, and I'll take any suggestions on how to do this better (I'm sure there are undocumented functions to do this properly).
Basic usage:
ds = Dataset[Association["a" -> Association["a" -> 1, "b" -> 2], "b" -> Association["a" -> 3, "b" -> 4]]];
formatDS[ds]
This is extended from something for my own use, hence the small font size and the RotateLabel
option. People might want to tweak at least the font size when using this.
ClearAll[formatDS]
Options[formatDS] = {RotateLabel -> False};
formatDS[ds_Dataset, opt : OptionsPattern[]] :=
Module[
{$headerColour, $frameColour,
hasRowKeys, hasColKeys,
colKeys, rowKeys, content,
rot, type, form}
,
$headerColour = GrayLevel[0.9];
$frameColour = GrayLevel[0.8];
type = Dataset`GetType@Dataset@Normal[ds]; (* force re-compute type *)
hasRowKeys = MatchQ[type, HoldPattern[Assoc[_Atom, __] | _Struct]];
hasColKeys =
MatchQ[type,
HoldPattern[
Assoc[_Atom, _Assoc | _Struct, ___] |
Vector[_Assoc | _Struct, ___]]];
If[hasRowKeys, rowKeys = Normal@ds[Keys]];
If[hasColKeys, colKeys = Normal@ds[1, Keys]];
rot = If[TrueQ@OptionValue[RotateLabel], Rotate[#, Pi/2] &,
Identity];
Which[
hasRowKeys && hasColKeys,
content = Normal@ds[Values, Values];
form = Sequence[ArrayFlatten[( {
{Null, {rot /@ colKeys}},
{List /@ rowKeys, content}
} )], Background -> {{$headerColour}, {$headerColour}}];
,
hasRowKeys,
content = Normal@ds[Values];
form =
Sequence[ArrayFlatten[{{List /@ rowKeys, content}}],
Background -> {{$headerColour}, None}];
,
hasColKeys,
content = Normal@ds[All, Values];
form = Sequence[ArrayFlatten[( {
{{rot /@ colKeys}},
{content}
} )], Background -> {None, {$headerColour}}];
,
True, form = content = Normal[ds]
];
Style[
Grid[
form,
Frame -> All,
FrameStyle -> $frameColour,
Spacings -> {1, 1},
ItemSize -> Full
],
"Text", FontSize -> 10
]
]