Pyspark dataframe convert multiple columns to float
float()
is not a Spark function, you need the function cast()
:
from pyspark.sql.functions import col
df_temp.select(*(col(c).cast("float").alias(c) for c in df_temp.columns))
If you want to cast multiple columns to float and keep other columns the same, you can use a single select statement.
columns_to_cast = ["col1", "col2", "col3"]
df_temp = (
df
.select(
*(c for c in df.columns if c not in columns_to_cast),
*(col(c).cast("float").alias(c) for c in columns_to_cast)
)
)
I saw the withColumn answer which will work, but since spark dataframes are immutable, each withColumn call generates a completely new dataframe
if you want to cast some columns without change the whole data frame, you can do that by withColumn function:
for col_name in cols:
df = df.withColumn(col_name, col(col_name).cast('float'))
this will cast type of columns in cols list and keep another columns as is.
Note:
withColumn function used to replace or create new column based on name of column;
if column name is exist it will be replaced, else it will be created