multiple conditions for filter in spark data frames
Instead of:
df2 = df1.filter("Status=2" || "Status =3")
Try:
df2 = df1.filter($"Status" === 2 || $"Status" === 3)
This question has been answered but for future reference, I would like to mention that, in the context of this question, the where
and filter
methods in Dataset/Dataframe supports two syntaxes:
The SQL string parameters:
df2 = df1.filter(("Status = 2 or Status = 3"))
and Col based parameters (mentioned by @David ):
df2 = df1.filter($"Status" === 2 || $"Status" === 3)
It seems the OP'd combined these two syntaxes. Personally, I prefer the first syntax because it's cleaner and more generic.
In spark/scala, it's pretty easy to filter with varargs.
val d = spark.read...//data contains column named matid
val ids = Seq("BNBEL0608AH", "BNBEL00608H")
val filtered = d.filter($"matid".isin(ids:_*))