Pandas Standard Deviation returns NaN
Not exactly what was asked in the question, but if you wanted to avoid NaN
values, calculate the population standard deviation, specified with std(ddof=0)
:
>>> print(df.groupby('Category').std(ddof=0))
A B C D E F
Category
A 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
B 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
C 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
D 0.248192 0.195198 0.275101 0.194955 0.190215 0.052423
E 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
F 0.288417 0.127854 0.065012 0.110096 0.354885 0.191643
G 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
H 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Note the different defaults for ddof
(Delta Degrees of Freedom):
- Pandas:
DataFrame.std
has defaultddof=1
for sample standard deviation (divisor: N − 1) - NumPy:
numpy.std
has defaultddof=0
for population standard deviation (divisor: N)
You could fillna
to replace the missing values - passing in a DataFrame
with the last value of each group.
In [86]: (df.groupby('Category').std()
...: .fillna(df.groupby('Category').last()))
Out[86]:
A B C D E F
Category
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.350996 0.276052 0.389051 0.275708 0.269005 0.074137
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.407883 0.180813 0.091941 0.155699 0.501884 0.271024
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924