Weighted Gaussian kernel density estimation in `python`
Neither sklearn.neighbors.KernelDensity
nor statsmodels.nonparametric
seem to support weighted samples. I modified scipy.stats.gaussian_kde
to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.
An ipython
notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5
Implementation details
The weighted arithmetic mean is
The unbiased data covariance matrix is then given by
The bandwidth can be chosen by scott
or silverman
rules as in scipy
. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.
Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.
For univariate distributions you can use KDEUnivariate
from statsmodels. It is not well documented, but the fit
methods accepts a weights
argument. Then you cannot use FFT. Here is an example:
import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate
kde1= KDEUnivariate(np.array([10.,10.,10.,5.]))
kde1.fit(bw=0.5)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')
kde1= KDEUnivariate(np.array([10.,5.]))
kde1.fit(weights=np.array([3.,1.]),
bw=0.5,
fft=False)
plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')
which produces this figure: