Need count of negative values in a dataframe
I am able to get for an array but unable to find for DataFrame
It's possible to flatten the DataFrame to use functions that operation on 1D arrays. So if you're okay with that (likely to be slower than EdChum's answer):
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [-3, -2, 4], 'b': [-2, 2, 5], 'c': [-1, 3, 7], 'd': [1, 4, 8]})
>>> df.values
array([[-3, -2, -1, 1],
[-2, 2, 3, 4],
[ 4, 5, 7, 8]])
>>> df.values.flatten()
array([-3, -2, -1, 1, -2, 2, 3, 4, 4, 5, 7, 8])
>>> sum(n < 0 for n in df.values.flatten())
4
You can call .lt
to compare the df against a scalar value and then call sum
twice (this is because it sums row-wise first)
In [66]:
df.lt(0).sum()
Out[66]:
a 2
b 1
c 1
d 0
dtype: int64
Call sum
again to sum the Series
:
In [58]:
df.lt(0).sum().sum()
Out[58]:
4
You can also convert the boolean df to a 1-D array and call np.sum
:
In [62]:
np.sum((df < 0).values.ravel())
Out[62]:
4
Timings
For a 30K row df:
In [70]:
%timeit sum(n < 0 for n in df.values.flatten())
%timeit df.lt(0).sum().sum()
%timeit np.sum((df < 0).values.ravel())
1 loops, best of 3: 405 ms per loop
100 loops, best of 3: 2.36 ms per loop
1000 loops, best of 3: 770 µs per loop
The np method wins easily here ~525x faster than the loop method and ~4x faster than the pure pandas method