multiple conditions for filter in spark data frames

Instead of:

df2 = df1.filter("Status=2" || "Status =3")

Try:

df2 = df1.filter($"Status" === 2 || $"Status" === 3)

This question has been answered but for future reference, I would like to mention that, in the context of this question, the where and filter methods in Dataset/Dataframe supports two syntaxes: The SQL string parameters:

df2 = df1.filter(("Status = 2 or Status = 3"))

and Col based parameters (mentioned by @David ):

df2 = df1.filter($"Status" === 2 || $"Status" === 3)

It seems the OP'd combined these two syntaxes. Personally, I prefer the first syntax because it's cleaner and more generic.


In spark/scala, it's pretty easy to filter with varargs.

val d = spark.read...//data contains column named matid
val ids = Seq("BNBEL0608AH", "BNBEL00608H")
val filtered = d.filter($"matid".isin(ids:_*))