drop unused categories using groupby on categorical variable in pandas

Option 1
remove_unused_categories

df.groupby(df.cats.cat.remove_unused_categories()).mean()

      values
cats        
a          1
b          2
c          4

You can also make the assignment first, and then groupby -

df.assign(cats=df.cats.cat.remove_unused_categories()).groupby('cats').mean()

Or,

df['cats'] = df.cats.cat.remove_unused_categories()
df.groupby('cats').mean()

      values
cats        
a          1
b          2
c          4

Option 2
astype to str conversion -

df.groupby(df.cats.astype(str)).mean()

      values
cats        
a          1
b          2
c          4

Just chain with dropna. Like so:

df.groupby("cats").mean().dropna()

      values
cats
a        1.0
b        2.0
c        4.0

Since version 0.23 you can specify observed=True in the groupby call to achieve the desired behavior.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Related