How to make a pandas crosstab with percentages?
From Pandas 0.18.1 onwards, there's a normalize
option:
In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:
B A B C
A
one 0.333333 0.333333 0.333333
three 0.333333 0.333333 0.333333
two 0.333333 0.333333 0.333333
Where you can normalise across either all
, index
(rows), or columns
.
More details are available in the documentation.
We can show it as percentages by multiplying by 100
:
pd.crosstab(df.A,df.B, normalize='index')\
.round(4)*100
B A B C
A
one 33.33 33.33 33.33
three 33.33 33.33 33.33
two 33.33 33.33 33.33
Where I've rounded for convenience.
pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)
Basically you just have the function that does row/row.sum()
, and you use apply
with axis=1
to apply it by row.
(If doing this in Python 2, you should use from __future__ import division
to make sure division always returns a float.)