How to create new DataFrame with dict
I just wanted to add an easy way to create DF, using pyspark
values = [("K1","true","false"),("K2","true","false")]
columns = ['Key', 'V1', 'V2']
df = spark.createDataFrame(values, columns)
from pyspark import SparkContext,SparkConf
from pyspark.sql import SQLContext
sc = SparkContext()
spark = SQLContext(sc)
val_dict = {
'key1':val1,
'key2':val2,
'key3':val3
}
rdd = sc.parallelize([val_dict])
bu_zdf = spark.read.json(rdd)
I just wanted to contribute a different and possibly easier way to solve this.
In my code I convert a dict to a pandas dataframe, which I find is much easier. Then I directly convert the pandas dataframe to spark.
data = {'visitor': ['foo', 'bar', 'jelmer'],
'A': [0, 1, 0],
'B': [1, 0, 1],
'C': [1, 0, 0]}
df = pd.DataFrame(data)
ddf = spark.createDataFrame(df)
Output:
+---+---+---+-------+
| A| B| C|visitor|
+---+---+---+-------+
| 0| 1| 1| foo|
| 1| 0| 0| bar|
| 0| 1| 0| jelmer|
+---+---+---+-------+