Using scipy gaussian kernel density estimation to calculate CDF inverse
You can use some python tricks for fast and memory-effective estimation of the CDF (based on this answer):
from scipy.special import ndtr
cdf = tuple(ndtr(np.ravel(item - kde.dataset) / kde.factor).mean()
for item in x)
It works as fast as this answer, but has linear (len(kde.dataset)
) space complexity instead of the quadratic (actually, len(kde.dataset) * len(x)
) one.
All you have to do next is to use inverse approximation, for instance, from statsmodels.
The method integrate_box_1d
can be used to compute the CDF, but it is not vectorized; you'll need to loop over points. If memory is not an issue, rewriting its source code (which is essentially just a call to special.ndtr
) in vector form may speed things up.
from scipy.special import ndtr
stdev = np.sqrt(kde.covariance)[0, 0]
pde_cdf = ndtr(np.subtract.outer(x, n)).mean(axis=1)
plot(x, pde_cdf)
The plot of the inverse function would be plot(pde_cdf, x)
. If the goal is to compute the inverse function at a specific point, consider using the inverse of interpolating spline, interpolating the computed values of the CDF.