Finding a distribution to fit truncated data

Because your distribution is (at least) bimodal and because the probability density is greater than zero at zero, you might consider using nonparametric density estimation rather than maybe a poor approximation with a mixture of a small number of distributions. (However, if it works with a mixture distribution, then that's great.)

You've got lots and lots of data and you don't need to force a description with a mixture of standard distributions.

Many times just using SmoothKernelDistribution on the data gets you everything you need. In this case because the density is definitely not zero at the border of zero, one needs to do a bit of (commonly used) trickery but one still gets a legitimate and appropriate estimate of the probability density. (See Silverman, Page 20).

First one creates a dataset that includes a reflection of the data:

data2 = Flatten[{data, -data}];

Then the SmoothKernelDistribution function is used followed by truncating the resulting distribution to be between zero and $\infty$.

skd2 = SmoothKernelDistribution[data2];
skd = TruncatedDistribution[{0, ∞}, skd2];

One then has access to the functions associated with a probability distribution: PDF, CDF, Expectation, etc.

Plot[PDF[skd, x], {x, 0, 0.4}]

Nonparametric density estimate

To repeat: when you've got lots of data you are not restricted to standard distributions. Really.

As you said in the comments, the data for your histograms did contain negatives and you squared all of them. I suggest to consider looking at your values before you square them

Mathematica graphics

Although I have no insight into the underlying process that created the data, one could assume that you have 4 mixed normal distributions here. It seem the left peak is again divide into two distributions. Let us use this for a start and define a mixture distribution:

dist = MixtureDistribution[{a, b, c, 
   d}, {NormalDistribution[μ1, σ1], 
   NormalDistribution[μ2, σ2], 
   NormalDistribution[μ3, σ3], 
   NormalDistribution[μ4, σ4]}]

Now we can estimate the parameters by using FindDistributionParameters. I have roughly estimated the initial the initial conditions by just looking where the peaks are, and how high and wide they are:

sol = 
 FindDistributionParameters[data, 
  dist, {{a, 
    1}, {μ1, -.4}, {σ1, .1}, {b, .3}, {μ2, .1}, {\
σ2, .05}, {c, .5}, {μ3, .4}, {σ3, .1}, {d, .4}, {\
μ4, -.2}, {σ4, .1}}]
(* {a -> 0.376702, b -> 0.125485, c -> 0.275036, 
 d -> 0.222777, μ1 -> -0.395739, σ1 -> 0.0586256, μ2 ->
   0.103749, σ2 -> 0.0496838, μ3 -> 0.337538, σ3 -> 
  0.0866439, μ4 -> -0.217508, σ4 -> 0.0911891} *)

With 250.000 data values, Mathematica has no problem finding a good fit. Looking at it reveals:

pdf1 = PDF[dist, x] /. sol;

Show[
 Histogram[data, {0.03}, "PDF"],
 Plot[PDF[dist, x] /. sol, {x, -.6, .6}]
 ]

Mathematica graphics

Now, since you squared all your values, we need to transform the found mixture distribution as well and you will end up with a distribution that has only values for positive x and fits your data very nicely:

pdf = PDF[TransformedDistribution[u^2, u \[Distributed] dist], x] /. sol;
Show[
 Histogram[data^2, {0.007}, "PDF"],
 Plot[pdf, {x, 0, .6}]
 ]

Mathematica graphics

Finding a distribution to fit truncated data

Tags:

Probability Or Statistics

Fitting

Distributions

Related

Recent Posts