How to sort a group in a way that I get the largest number in the first row and smallest in the second and the second largest in the third and so on
Take the sorted order and then apply a quadratic function to it where the root is 1/2 the length of the array (plus some small offset). This way the highest rank is given to the extremal values (the sign of the eps
offset determines whether you want a the highest value ranked above the lowest value). I added a small group at the end to show how it properly handles repeated values or an odd group size.
def extremal_rank(s):
eps = 10**-4
y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index)
- (len(s)+1)/2 + eps)**2
return y.reindex_like(s)
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])
Group Name Performance rnk
2 A Chad Webster 142 6.2505
0 A Sheldon Webb 33 6.2495
4 A Elijah Mendoza 122 2.2503
1 A Traci Dean 64 2.2497
3 A Ora Harmon 116 0.2501
5 A June Strickland 68 0.2499
8 B Joel Gill 132 2.2503
9 B Vernon Stone 80 2.2497
7 B Betty Sutton 127 0.2501
6 B Beth Vasquez 95 0.2499
11 C b 110 9.0006
12 C c 68 8.9994
10 C a 110 4.0004
13 C d 68 3.9996
15 C f 70 1.0002
16 C g 70 0.9998
14 C e 70 0.0000
You can avoid groupby
if you use sort_values
on Performace once ascending once descending, concat
both sorted dataframes, then use sort_index
and drop_duplicates
to get the expected output:
df_ = (pd.concat([df.sort_values(['Group', 'Performance'], ascending=[True, False])
.reset_index(), #need the original index for later drop_duplicates
df.sort_values(['Group', 'Performance'], ascending=[True, True])
.reset_index()
.set_index(np.arange(len(df))+0.5)], # for later sort_index
axis=0)
.sort_index()
.drop_duplicates('index', keep='first')
.reset_index(drop=True)
[['Group', 'Name', 'Performance']]
)
print(df_)
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95
Apply the sorted concatenation of nlargest
and nsmallest
for each group:
>>> (df.groupby('Group')[df.columns[1:]]
.apply(lambda x:
pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
x.nsmallest(x.shape[0]-x.shape[0]//2,'Performance').reset_index()]
)
.sort_index()
.drop('index',1))
.reset_index().drop('level_1',1))
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95