Seaborn countplot with normalized y axis per group

I might be confused. The difference between your output and the output of

occupation_counts = (df.groupby(['income'])['occupation']
                     .value_counts(normalize=True)
                     .rename('percentage')
                     .mul(100)
                     .reset_index()
                     .sort_values('occupation'))
p = sns.barplot(x="occupation", y="percentage", hue="income", data=occupation_counts)
_ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

is, it seems to me, only the order of the columns.

enter image description here

And you seem to care about that, since you pass sort=False. But then, in your code the order is determined uniquely by chance (and the order in which the dictionary is iterated even changes from run to run with Python 3.5).


You could do this with sns.histplot by setting the following properties:

  • stat = 'density' (this will make the y-axis the density rather than count)
  • common_norm = False (this will normalize each density independently)

See the simple example below:

import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')

ax = sns.histplot(x = df['class'], hue=df['survived'], multiple="dodge", 
                  stat = 'density', shrink = 0.8, common_norm=False)

Output


With newer versions of seaborn you can do following:

import numpy as np
import pandas as pd
import seaborn as sns
sns.set(color_codes=True)

df = sns.load_dataset('titanic')
df.head()

x,y = 'class', 'survived'

(df
.groupby(x)[y]
.value_counts(normalize=True)
.mul(100)
.rename('percent')
.reset_index()
.pipe((sns.catplot,'data'), x=x,y='percent',hue=y,kind='bar'))


output

enter image description here

Update: Also show percentages on top of barplots

If you also want percentages, you can do following:

import numpy as np
import pandas as pd
import seaborn as sns

df = sns.load_dataset('titanic')
df.head()

x,y = 'class', 'survived'

df1 = df.groupby(x)[y].value_counts(normalize=True)
df1 = df1.mul(100)
df1 = df1.rename('percent').reset_index()

g = sns.catplot(x=x,y='percent',hue=y,kind='bar',data=df1)
g.ax.set_ylim(0,100)

for p in g.ax.patches:
    txt = str(p.get_height().round(2)) + '%'
    txt_x = p.get_x() 
    txt_y = p.get_height()
    g.ax.text(txt_x,txt_y,txt)

enter image description here