python pandas select rows where two columns are (not) equal
Use the complement operator ~
hsp.loc[~(hsp['Type_old'] == hsp['Type_new'])]
which gives:
id Type_old Type_new Len_old Len_new
1 2 Num Char 12 12
2 3 Char Num 10 8
When dealing with Boolean operations, the complement operator is a handy way to invert True
with False
Ways to be confused by ==
versus !=
when comparing pd.Series
As expected
df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)
Len_old Len_new NE
0 15 15 False
1 12 12 False
2 10 8 True
3 4 5 True
4 9 10 True
But if one of the column's values were strings!
df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)
Len_old Len_new NE
0 15 15 True
1 12 12 True
2 10 8 True
3 4 5 True
4 9 10 True
Make sure both are the same types.
Your code, as piRSquared said, had an issue with types.
Besides that, you could use comparing methods, in this case pd.Series.ne
Using your data:
hsp.loc[hsp['Type_old'].ne(hsp['Type_new'])]
But again, as piRSquared mentioned, because of dtypes it didn't work.
Just in case, you have to take care about NaN
/None
values at your data... such:
hsp.loc[ ( hsp['Type_old'].ne(hsp['Type_new']) ) && (hsp['Type_old'].notna())]
In this case, .ne
has another argument, fill_value
, which fill missing data.
In addition, you could use "compare" method to show difference between two series (or DataFrames)
hsp.Len_old.compare(hsp.Len_new)
And it might return (if columns were of the same dtype):
self other
2 10.0 8.0
3 4.0 5.0
4 9.0 10.0
But just force to have another dtype:
hsp.Len_old.compare(hsp.Len_new.astype('str')) # string type new column
It will return all rows:
self other
0 15 15
1 12 12
2 10 8
3 4 5
4 9 10