Best way to convert numerical columns of a dataset with missing values into a matrix
It's not entirely clear if you want to replace the rows that contain a Missing
or if you want to leave them out. When you want to leave them out, the very verbose way would be to write it out like this:
Query[
Select[!MissingQ[#"passengerAge"] && ! MissingQ[#"passengerWeight"] &],
{#"passengerAge", #"passengerWeight"} &
][ds]
and then normalizing.
Query
is not strictly necessary, but due to the good advertisement of WReach, I learned to prefer it. This is possible too
ds[Select[! MissingQ[#"passengerAge"] && !MissingQ[#"passengerWeight"] &],
{#"passengerAge", #"passengerWeight"} &]
Edit
Thank you for your answer! Preferably, I would like the
Missing
to be replaced by 0's.
Then probably:
Query[ReplaceAll[Missing[] -> 0], {#"passengerAge", #"passengerWeight"} &][ds]
This works:
Block[{Missing},
Missing[] = 0;
ds[All, Select[NumberQ]][Values]]
ds[associationIndex][Transpose][
GroupBy[Query[DeleteMissing /* Query[Apply[And], NumericQ]]] /*
Query[{Key[True] -> Query[All, All, Replace[_Missing -> 0.]]}] /*
Values /* Apply[Join]][Transpose]
Where the utility associationIndex
is one of dozens one-liners included in the API of my forthcoming book "Functional Data Workflow".
associationIndex[a_Association]:=Query[Normal/*MapIndexed[First[#2]->#1&]/*Association][a]
associationIndex[l_List]:=Query[MapIndexed[First[#2]->#1&]/*Association][l]
Without it, after the Apply[Join]
you'd have to thread keys back in - note the use of bracketing Transpose
. Such commutation is a common pattern, (btw Transpose
will also impute Missing
values to normalize ragged data.) - but careful as the op-form is buggy (I avoided RightComposition
)
You might want to mod the logic below to suit various applications:
Query[DeleteMissing /* Query[Apply[And], NumericQ]]
If you just want the numeric matrix, there's no need for the Join:
ds[associationIndex][Transpose][
GroupBy[Query[DeleteMissing /* Query[Apply[And], NumericQ]]] /*
Key[True] /*
Query[Values, Values, Replace[_Missing -> 0.]]][Transpose]
{{36.5, 186.5}, {9., 90.}, {35., 0.}, {60., 160.}, {23., 0.}, {22., 122.}, {0., 120}, {0., 0.}, {17., 170.}}