drop unused categories using groupby on categorical variable in pandas
Option 1remove_unused_categories
df.groupby(df.cats.cat.remove_unused_categories()).mean()
values
cats
a 1
b 2
c 4
You can also make the assignment first, and then groupby
-
df.assign(cats=df.cats.cat.remove_unused_categories()).groupby('cats').mean()
Or,
df['cats'] = df.cats.cat.remove_unused_categories()
df.groupby('cats').mean()
values
cats
a 1
b 2
c 4
Option 2astype
to str
conversion -
df.groupby(df.cats.astype(str)).mean()
values
cats
a 1
b 2
c 4
Just chain with dropna
. Like so:
df.groupby("cats").mean().dropna()
values
cats
a 1.0
b 2.0
c 4.0
Since version 0.23 you can specify observed=True
in the groupby
call to achieve the desired behavior.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html