Pandas GroupBy and select rows with the minimum value in a specific column

I feel like you're overthinking this. Just use groupby and idxmin:

df.loc[df.groupby('A').B.idxmin()]

   A  B   C
2  1  2  10
4  2  4   4

df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)

   A  B   C
0  1  2  10
1  2  4   4

Had a similar situation but with a more complex column heading (e.g. "B val") in which case this is needed:

df.loc[df.groupby('A')['B val'].idxmin()]

The accepted answer (suggesting idxmin) cannot be used with the pipe pattern. A pipe-friendly alternative is to first sort values and then use groupby with DataFrame.head:

data.sort_values('B').groupby('A').apply(DataFrame.head, n=1)

This is possible because by default groupby preserves the order of rows within each group, which is stable and documented behaviour (see pandas.DataFrame.groupby).

This approach has additional benefits:

it can be easily expanded to select n rows with smallest values in specific column

it can break ties by providing another column (as a list) to .sort_values(), e.g.:

data.sort_values(['final_score', 'midterm_score']).groupby('year').apply(DataFrame.head, n=1)

As with other answers, to exactly match the result desired in the question .reset_index(drop=True) is needed, making the final snippet:

df.sort_values('B').groupby('A').apply(DataFrame.head, n=1).reset_index(drop=True)

Pandas GroupBy and select rows with the minimum value in a specific column

Tags:

Python

Pandas

Group By

Pandas Groupby

Related

Recent Posts