Is there a parameter in matplotlib/pandas to have the Y axis of a histogram as percentage?
The density=True
(normed=True
for matplotlib < 2.2.0
) returns a histogram for which np.sum(pdf * np.diff(bins))
equals 1. If you want the sum of the histogram to be 1 you can use Numpy's histogram() and normalize the results yourself.
x = np.random.randn(30)
fig, ax = plt.subplots(1,2, figsize=(10,4))
ax[0].hist(x, density=True, color='grey')
hist, bins = np.histogram(x)
ax[1].bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), color='grey')
ax[0].set_title('normed=True')
ax[1].set_title('hist = hist / hist.sum()')
Btw: Strange plotting glitch at the first bin of the left plot.
Pandas plotting can accept any extra keyword arguments from the respective matplotlib function. So for completeness from the comments of others here, this is how one would do it:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,2), columns=list('AB'))
df.hist(density=1)
Also, for direct comparison this may be a good way as well:
df.plot(kind='hist', density=1, bins=20, stacked=False, alpha=.5)
Looks like @CarstenKönig found the right way:
df.hist(bins=20, weights=np.ones_like(df[df.columns[0]]) * 100. / len(df))