pandas dataframe to spark dataframe code example

Example 1: convert pandas dataframe to spark dataframe

import pandas as pd
from pyspark.sql import SparkSession

filename = <'path to file'>
spark = SparkSession.build.appName('pandasToSpark').getOrCreate()
# Assuming file is csv
pandas_df = pd.read_csv(filename)
spark_df = spark.CreateDataFrame(pandas_df)

Example 2: pandas dataframe convert string to float

df_raw['PricePerSeat_Outdoor'] = pd.to_numeric(df_raw['PricePerSeat_Outdoor'], errors='coerce')

Example 3: save pandas dataframe to parquet

df.to_parquet('df.parquet.gzip', compression='gzip')

Example 4: spark to pandas

pandas_df = some_df.toPandas()