Group by and find top n value_counts pandas

I think you can use nlargest - you can change 1 to 5:

s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough                      
Bronx          Melrose            7
Manhattan      Midtown           12
               Lincoln Square     2
Staten Island  Grant City        11
dtype: int64

print s.groupby(level=[0,1]).nlargest(1)
Bronx          Bronx          Melrose        7
Manhattan      Manhattan      Midtown       12
Staten Island  Staten Island  Grant City    11
dtype: int64

additional columns were getting created, specified level info

df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)

head() gets the top 5 rows in a data frame.

Solution: for get topn from every group

df.groupby(['Borough']).Neighborhood.value_counts().groupby(level=0, group_keys=False).head(5)

.value_counts().nlargest(5) in other answers only give you one group top 5, doesn't make sence for me too.
group_keys=False to avoid duplicated index
because value_counts() has already sorted, just need head(5)

You can do this in a single line by slightly extending your original groupby with 'nlargest':

>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough        Neighborhood    Neighborhood  
Bronx          Melrose         Melrose           1
Manhattan      Midtown         Midtown           1
Manhatten      Lincoln Square  Lincoln Square    1
               Midtown         Midtown           1
Staten Island  Grant City      Grant City        1
dtype: int64

Group by and find top n value_counts pandas

Solution: for get topn from every group

Tags:

Python

Pandas

Dataframe

Related

Recent Posts