How to plot a probability mass function in python
I think my original terminology was off. I have an array of continuous values [0-1) which I want to discretize and use to plot a probability mass function. I thought this might be common enough to warrant a single method to do it.
Here's the code:
x = [random.random() for r in xrange(1000)]
num_bins = 50
counts, bins = np.histogram(x, bins=num_bins)
bins = bins[:-1] + (bins[1] - bins[0])/2
probs = counts/float(counts.sum())
print probs.sum() # 1.0
plt.bar(bins, probs, 1.0/num_bins)
plt.show()
I think you are mistaking a sum for an integral. A proper PDF (probability distribution function) integrates to unity; if you simply take the sum you may be missing out on the size of the rectangle.
import numpy as np
import pylab as plt
N = 10**5
X = np.random.normal(size=N)
counts, bins = np.histogram(X,bins=50, density=True)
bins = bins[:-1] + (bins[1] - bins[0])/2
print np.trapz(counts, bins)
Gives .999985
, which is close enough to unity.
EDIT: In response to the comment below:
If x=[.2, .2, .8] and I'm looking for a graph with two bars, one at .2 with height .66 because 66% of the values are at .2 and one bar at .8 with height .33, what would that graph be called and how do I generate it?
The following code:
from collections import Counter
x = [.2,.2,.8]
C = Counter(x)
total = float(sum(C.values()))
for key in C: C[key] /= total
Gives a "dictionary" C=Counter({0.2: 0.666666, 0.8: 0.333333})
. From here one could construct a bar graph, but this would only work if the PDF is discrete and takes only a finite fixed set of values that are well separated from each other.