how to filter a spark dataframe by a boolean column
You're comparing data types incorrectly. open
is listed as a Boolean value, not a string, so doing yelp_df["open"] == "true"
is incorrect - "true"
is a string.
Instead you want to do
yelp_df.filter(yelp_df["open"] == True).collect()
This correctly compares the values of open
against the Boolean primitive True
, rather than the non-Boolean string "true"
.
from pyspark.sql import functions as F
filtered_df = df.filter(F.col('my_bool_col'))