Creating a new Spark DataFrame with new column value based on column in first dataframe Java
I believe you can use when
to achieve that. Additionally, you probably can replace the old column directly. For your example, the code would be something like:
import static org.apache.spark.sql.functions.*;
Column newCol = when(col("C").equalTo("A"), "X")
.when(col("C").equalTo("B"), "Y")
.otherwise("Z");
DataFrame df2 = df1.withColumn("C", newCol);
For more details about when
, check the Column
Javadoc.
Thanks to Daniel I have resolved this :)
The missing piece was the static import of the sql functions
import static org.apache.spark.sql.functions.*;
I must have tried a million different ways of using when, but got compile failures/runtime errors because I didn't do the import. Once imported Daniel's answer was spot on!
You may also use udf's to do the same job. Just write a simple if then else structure
import org.apache.spark.sql.functions.udf
val customFunct = udf { d =>
//if then else construct
}
val new_DF= df.withColumn(column_name, customFunct(df("data_column")))