Pandas "Can only compare identically-labeled DataFrame objects" error
Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])
In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])
In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects
One solution is to sort the index first (Note: some functions require sorted indexes):
In [4]: df2.sort_index(inplace=True)
In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
Note: ==
is also sensitive to the order of columns, so you may have to use sort_index(axis=1)
:
In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]:
0 1
0 True True
1 True True
Note: This can still raise (if the index/columns aren't identically labelled after sorting).
At the time when this question was asked there wasn't another function in Pandas to test equality, but it has been added a while ago: pandas.equals
You use it like this:
df1.equals(df2)
Some differenes to ==
are:
- You don't get the error described in the question
- It returns a simple boolean.
- NaN values in the same location are considered equal
- 2 DataFrames need to have the same
dtype
to be considered equal, see this stackoverflow question
EDIT:
As pointed out in @paperskilltrees answer index alignment is important. Apart from the solution provided there another option is to sort the index of the DataFrames before comparing the DataFrames. For df1
that would be df1.sort_index(inplace=True)
.
You can also try dropping the index column if it is not needed to compare:
print(df1.reset_index(drop=True) == df2.reset_index(drop=True))
I have used this same technique in a unit test like so:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))