Group by and find top n value_counts pandas
I think you can use nlargest
- you can change 1
to 5
:
s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough
Bronx Melrose 7
Manhattan Midtown 12
Lincoln Square 2
Staten Island Grant City 11
dtype: int64
print s.groupby(level=[0,1]).nlargest(1)
Bronx Bronx Melrose 7
Manhattan Manhattan Midtown 12
Staten Island Staten Island Grant City 11
dtype: int64
additional columns were getting created, specified level info
df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)
head()
gets the top 5 rows in a data frame.
Solution: for get topn from every group
df.groupby(['Borough']).Neighborhood.value_counts().groupby(level=0, group_keys=False).head(5)
.value_counts().nlargest(5)
in other answers only give you one group top 5, doesn't make sence for me too.group_keys=False
to avoid duplicated index- because
value_counts()
has already sorted, just needhead(5)
You can do this in a single line by slightly extending your original groupby with 'nlargest':
>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough Neighborhood Neighborhood
Bronx Melrose Melrose 1
Manhattan Midtown Midtown 1
Manhatten Lincoln Square Lincoln Square 1
Midtown Midtown 1
Staten Island Grant City Grant City 1
dtype: int64