Pandas Pivot_Table : Percentage of row calculation for non-numeric values

The possible duplicate noted by @maxymoo is pretty close to a solution, but I'll go ahead and write it up as an answer since there are a couple of differences that are not completely straightforward.

table = pd.pivot_table(df, values=["Document"],
                       index=["Name"], columns=["Time"], 
                       aggfunc=len, margins=True, 
                       dropna=True, fill_value=0)

       Document                      
Time 1 - 2 HOUR 1 HOUR 2 - 3 HOUR All
Name                                 
A             1      1          1   3
B             1      1          0   2
C             0      1          1   2
All           2      3          2   7

The main tweak there is to add fill_value=0 because what you really want there is a count value of zero, not a NaN.

Then you can basically use the solution @maxymoo linked to, but you need to use iloc or similar b/c the table columns are a little complicated now (being a multi-indexed result of the pivot table).

table2 = table.div( table.iloc[:,-1], axis=0 )

       Document                         
Time 1 - 2 HOUR    1 HOUR 2 - 3 HOUR All
Name                                    
A      0.333333  0.333333   0.333333   1
B      0.500000  0.500000   0.000000   1
C      0.000000  0.500000   0.500000   1
All    0.285714  0.428571   0.285714   1

You've still got some minor formatting work to do there (flip first and second columns and convert to %), but those are the numbers you are looking for.

Btw, it's not necessary here, but you might want to think about converting 'Time' to an ordered categorical variable, which would be one way to solve the column ordering problem (I think), but may or may not be worth the bother depending on what else you are doing with the data.

Tags:

Python

Pandas