handling zeros in pandas DataFrames column divisions in Python
Just for completeness, I would like to add the following way of division that uses DataFrame.apply like:
df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)
In full:
In [1]:
df = pd.DataFrame({"a": [1, 2, 0, 1, 5, 0], "b": [0, 10, 20, 30, 50, 0]}).astype('float64')
def div(numerator, denominator):
return lambda row: 0.0 if row[denominator] == 0 else float(row[numerator]/row[denominator])
df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)
Out[1]:
a b c
0 1.0 0.0 0.000000
1 2.0 10.0 0.200000
2 0.0 20.0 0.000000
3 1.0 30.0 0.033333
4 5.0 50.0 0.100000
5 0.0 0.0 0.000000
This solution is slower than the one proposed by Jeff:
df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)
# 1.27 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
df.loc[:, 'c'] = df.a/df.b.replace({ 0 : np.inf })
# 651 µs ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You need to work in floats, otherwise you will have integer division, prob not what you want
In [12]: df = pandas.DataFrame({"a": [1, 2, 0, 1, 5],
"b": [0, 10, 20, 30, 50]}).astype('float64')
In [13]: df
Out[13]:
a b
0 1 0
1 2 10
2 0 20
3 1 30
4 5 50
In [14]: df.dtypes
Out[14]:
a float64
b float64
dtype: object
Here's one way
In [15]: x = df.a/df.b
In [16]: x
Out[16]:
0 inf
1 0.200000
2 0.000000
3 0.033333
4 0.100000
dtype: float64
In [17]: x[np.isinf(x)] = np.nan
In [18]: x
Out[18]:
0 NaN
1 0.200000
2 0.000000
3 0.033333
4 0.100000
dtype: float64
Here's another way
In [20]: df.a/df.b.replace({ 0 : np.nan })
Out[20]:
0 NaN
1 0.200000
2 0.000000
3 0.033333
4 0.100000
dtype: float64