Building a row from a dict in pySpark
In case the dict is not flatten, you can convert dict to Row recursively.
def as_row(obj):
if isinstance(obj, dict):
dictionary = {k: as_row(v) for k, v in obj.items()}
return Row(**dictionary)
elif isinstance(obj, list):
return [as_row(v) for v in obj]
else:
return obj
You can use keyword arguments unpacking as follows:
Row(**row_dict)
## Row(C0=-1.1990072635132698, C3=0.12605772684660232, C4=0.5760856026559944,
## C5=0.1951877800894315, C6=24.72378589441825, summary='kurtosis')
It is important to note that it internally sorts data by key to address problems with older Python versions.
This behavior is likely to be removed in the upcoming releases - see SPARK-29748 Remove sorting of fields in PySpark SQL Row creation. Once it is remove you'll have to ensure that the order of values in the dict
is consistent across records.