How to compare two dataframe and print columns that are different in scala
list_col=[]
cols=df1.columns
# Prepare list of dataframes/per column
for col in cols:
list_col.append(df1.select(col).subtract(df2.select(col)))
# Render/persist
for l in list_col :
if l.count() > 0 :
l.show()
From the scenario that is described in the above question, it looks like that difference has to be found between columns and not rows.
So, to do that we need to apply selective difference here, which will provide us the columns that have different values, along with the values.
Now, to apply selective difference we have to write code something like this:
First we need to find the columns in expected and actual data frames.
val columns = df1.schema.fields.map(_.name)
Then we have to find the difference columnwise.
val selectiveDifferences = columns.map(col => df1.select(col).except(df2.select(col)))
At last we need to find out which columns contain different values.
selectiveDifferences.map(diff => {if(diff.count > 0) diff.show})
And, we will get only the columns that contain different values. Like this:
+--------+
|emp_name|
+--------+
| romino|
+--------+
I hope this helps!