How to work with a Dataset?
There are two issues under discussion: 1) the distinct dataset visualizations for the same data and 2) ways to update dataset subelements in place. We will discuss these separately.
Distinct Dataset Visualizations
The way a dataset is displayed is sensitive to the data type of the dataset. That type, in turn, is sensitive to the history of the dataset. This is discussed at length in (143551). For the case at hand, we can see how the data type evolves with each AppendTo
operation:
Needs["Dataset`"]
Needs["TypeSystem`"]
{ db = Dataset[{}]
, AppendTo[db,<|"a" -> 1, "b" -> 2|>]
, AppendTo[db,<|"a" -> 2, "b" -> 5|>]
, AppendTo[db,<|"x" -> 2, "y" -> 5|>]
} // Unevaluated // Map[{#, GetType[#]}&] // Grid[#, Frame->All, Alignment->Left]&
The principal data type is a Vector
of Assoc
. The last row shows how adding the incompatible keys "x"
and "y"
switched the key type from Enumeration
to the generic AnyType
.
Now constrast this to db2
:
db2 = Dataset[{<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 5|>}]
db2 // GetType
(* Vector[Struct[{a, b}, {Atom[Integer], Atom[Integer]}], 2] *)
The principal data type is now a Vector
of Struct
. A "struct" represents the case when the dataset is known to contain associations of consistent type. It deduced this at the time that db2
was constructed.
In the case of db
which was built incrementally, the type system infers the final data type from a combination of the initial data type and the type transformations of any applied operators (e.g. AppendTo
). Such type inferencing is generally less specific than the type deduction that occurs at construction time. We can use Dataset
as a query operator to force reconstruction of a dataset and thereby deduce its data type anew:
db = Dataset[{}];
AppendTo[db, <|"a" -> 1, "b" -> 2|>];
AppendTo[db, <|"a" -> 2, "b" -> 5|>]
db // GetType
(* Vector[Assoc[Atom[Enumeration["a", "b"]], Atom[Integer], 2], 2] *)
db = db[Dataset]
db // GetType
(* Vector[Struct[{"a", "b"}, {Atom[Integer], Atom[Integer]}], 2] *)
Updating Subelements of Datasets
There are presently very few ways to update a Dataset
in place. See, for example, the discussion in (54491) or the work-around sketched in (141916). In particular, the kinds of update contemplated in the question are not presently supported.
The way to achieve such alterations presently is through query operators. For example, we can append a new key "c"
to element 1
:
db3 = db2[{1 -> Append["c" -> 7]}]
Note that by adding the key "c"
to only one of the associations, the data type switched from Assoc
to Struct
and gave us the vertical key/value pair visualization we saw earlier.. If we had added "c
" to all assocations, we would have retained the Struct
tabular visualization:
db3 = db2[{1 -> Append["c" -> 7], 2 -> Append["c" -> 8]}]
The closest thing to updating a dataset in place is expressed as db = db[...ops...]
.
It is possible to update a simple list of associations in place:
$list = db2 // Normal
(* {<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 5|>} *)
$list[[1, "c"]] = 7;
$list
(* {<|"a" -> 1, "b" -> 2, "c" -> 7|>, <|"a" -> 2, "b" -> 5|>} *)
Closing Comments
Beware that performing large numbers of incremental changes to datasets will likely get progressively slower. This is the dataset analog to repeatedly applying AppendTo
to a list, a strategy which exhibits a slow-down proportional to the square of the length of the list. The dataset infrastructure is best suited for operators that are applied to significant subsets of the dataset all at once (e.g. one or more complete columns).
The operation of the dataset type system is discussed in (89080). Choosing between datasets or associations is discussed in (87360)
Thank you both for the comment and the answer. They were very helpful. From them, and the postings referenced in the answer by @WReach, I came to the conclusion that a list of associations would best serve my purpose.
Here is what I came up with for functionality. (I deleted output because the association forms aren't very readable outside the Frontend. )
(* an empty database *)
db={};
(* add a record with a single key *)
AppendTo[db,<|"a"->1|>]
(* add a second record *)
AppendTo[db,<|"a"->2|>]
(* and a third *)
AppendTo[db,AssociationThread[{"a","b","c"},{5,7,9}]]
(* add a Key-value pair to the first record *)
AssociateTo[db[[1]],"b"->5]
(* modify a value *)
db[[1,"b"]]=7;db
(* total the "b" values, with Nothing for missing keys *)
Total@Lookup[db,"b",Nothing]
(* select records based on key value *)
Select[db,#["b"]==7&]