Spark: subtract two DataFrames
In PySpark it would be subtract
df1.subtract(df2)
or exceptAll
if duplicates need to be preserved
df1.exceptAll(df2)
According to the Scala API docs, doing:
dataFrame1.except(dataFrame2)
will return a new DataFrame containing rows in dataFrame1 but not in dataframe2.