how to modify one column value in one row used by pyspark
If you want to modify a subset of your DataFrame and keep the rest unchanged, the best option would be to use pyspark.sql.functions.when()
as using filter
or pyspark.sql.functions.where()
would remove all rows where the condition is not met.
from pyspark.sql.functions import col, when
valueWhenTrue = None # for example
df.withColumn(
"existingColumnToUpdate",
when(
col("userid") == 22650984,
valueWhenTrue
).otherwise(col("existingColumnToUpdate"))
)
When will evaluate the first argument as a boolean condition. If the condition is True
, it will return the second argument. You can chain together multiple when
statements as shown in this post and also this post. Or use otherwise()
to specify what to do when the condition is False
.
In this example, I am updating an existing column "existingColumnToUpdate"
. When the userid
is equal to the specified value, I will update the column with valueWhenTrue
. Otherwise, we will keep the value in the column unchanged.