Spark: subtract two DataFrames

In PySpark it would be subtract

df1.subtract(df2)

or exceptAll if duplicates need to be preserved

df1.exceptAll(df2)

According to the Scala API docs, doing:

dataFrame1.except(dataFrame2)

will return a new DataFrame containing rows in dataFrame1 but not in dataframe2.

Related