pandas dataframe to spark dataframe code example
Example 1: convert pandas dataframe to spark dataframe
import pandas as pd
from pyspark.sql import SparkSession
filename = <'path to file'>
spark = SparkSession.build.appName('pandasToSpark').getOrCreate()
pandas_df = pd.read_csv(filename)
spark_df = spark.CreateDataFrame(pandas_df)
Example 2: pandas dataframe convert string to float
df_raw['PricePerSeat_Outdoor'] = pd.to_numeric(df_raw['PricePerSeat_Outdoor'], errors='coerce')
Example 3: save pandas dataframe to parquet
df.to_parquet('df.parquet.gzip', compression='gzip')
Example 4: spark to pandas
pandas_df = some_df.toPandas()