percentile rank in pandas in groups

You need to calculate rank within the group before normalizing within the group. The other answers will result in percentiles over 100%. I suggest:

df['percentile'] = df.groupby('year')['LgRnk'].rank(pct=True)

You can do an apply on the LgRnk column:

# just for me to normalize this, so my numbers will go from 0 to 1 in this example
In [11]: df['LgRnk'] = g.LgRnk.rank()

In [12]: g = df.groupby('Year')

In [13]: g.LgRnk.apply(lambda x: x / len(x))
Out[13]:
19    1.0
0     0.9
17    0.8
4     0.7
13    0.1
3     0.6
16    0.2
22    0.5
20    0.4
21    0.3
Name: 1985, dtype: float64

The Series groupby rank (which just applies Series.rank) take a pct argument to do just this:

In [21]: g.LgRnk.rank(pct=True)
Out[21]:
19    1.0
0     0.9
17    0.8
4     0.7
13    0.1
3     0.6
16    0.2
22    0.5
20    0.4
21    0.3
Name: 1985, dtype: float64

and directly on the WLPer column (although this is slightly different due to draws):

In [22]: g.WLPer.rank(pct=True, ascending=False)
Out[22]:
19    1.00
0     0.90
17    0.75
4     0.75
13    0.10
3     0.60
16    0.20
22    0.50
20    0.35
21    0.35
Name: 1985, dtype: float64

Note: I've changed the numbers on the first line, so you'll get different scores on your complete frame.

percentile rank in pandas in groups

Tags:

Python

Pandas

Statistics

Numpy

Scipy

Related

Recent Posts