how to filter a spark dataframe by a boolean column

You're comparing data types incorrectly. open is listed as a Boolean value, not a string, so doing yelp_df["open"] == "true" is incorrect - "true" is a string.

Instead you want to do

yelp_df.filter(yelp_df["open"] == True).collect()

This correctly compares the values of open against the Boolean primitive True, rather than the non-Boolean string "true".


from pyspark.sql import functions as F

filtered_df = df.filter(F.col('my_bool_col'))