Need count of negative values in a dataframe

I am able to get for an array but unable to find for DataFrame

It's possible to flatten the DataFrame to use functions that operation on 1D arrays. So if you're okay with that (likely to be slower than EdChum's answer):

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [-3, -2, 4], 'b': [-2, 2, 5], 'c': [-1, 3, 7], 'd': [1, 4, 8]})
>>> df.values
array([[-3, -2, -1,  1],
       [-2,  2,  3,  4],
       [ 4,  5,  7,  8]])
>>> df.values.flatten()
array([-3, -2, -1,  1, -2,  2,  3,  4,  4,  5,  7,  8])
>>> sum(n < 0 for n in df.values.flatten())
4

You can call .lt to compare the df against a scalar value and then call sum twice (this is because it sums row-wise first)

In [66]:
df.lt(0).sum()

Out[66]:
a    2
b    1
c    1
d    0
dtype: int64

Call sum again to sum the Series:

In [58]:
df.lt(0).sum().sum()

Out[58]:
4

You can also convert the boolean df to a 1-D array and call np.sum:

In [62]:
np.sum((df < 0).values.ravel())

Out[62]:
4

Timings

For a 30K row df:

In [70]:
%timeit sum(n < 0 for n in df.values.flatten())
%timeit df.lt(0).sum().sum()
%timeit np.sum((df < 0).values.ravel())

1 loops, best of 3: 405 ms per loop
100 loops, best of 3: 2.36 ms per loop
1000 loops, best of 3: 770 µs per loop

The np method wins easily here ~525x faster than the loop method and ~4x faster than the pure pandas method

Need count of negative values in a dataframe

Tags:

Python

Pandas

Related

Recent Posts