How do you Unit Test Python DataFrames

While Pandas' test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.

These functions compare NumPy arrays, but you can get the array that underlies a Pandas DataFrame using the values property. You can define a simple DataFrame and compare what your function returns to what you expect.

One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixtures to define that DataFrame once, and use it in multiple tests.

In terms of resources, I found this article on Testing with NumPy and Pandas to be very useful. I also did a short presentation about data analysis testing at PyCon Canada 2016: Automate Your Data Analysis Testing.


you can use pandas testing functions:

It will give more flexbile to compare your result with computed result in different ways.

For example:

df1=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[6,7,8,9,10]})

expected_res=pd.Series([7,9,11,13,15])
pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)

For more details refer this link