How to fit a boundary to a scatter plot

If the objective is to determine an envelope in the following form

$$f(t)=A\pm \sqrt{D t}+B t$$

that contains a desired expected proportion ($1-\alpha$) of the values at each time step, then for the time series mentioned the coefficients are $A=1$, $B=0$, and $D=(\Phi^{-1}(1-\alpha/2))^2$ where $\Phi^{-1}$ is the inverse of the standard normal cumulative distribution function. No data is needed.

(* Generate several time series *)
nSim = 200
n = 100
ManyRandomWalks = Table[RandomWalkData = RandomVariate[NormalDistribution[0, 1], n];
   RandomWalk = 1 + Accumulate[RandomWalkData];
   RandomWalk, {i, 1, nSim}];

(* Set parameters associated with this particular method of generating a time series *)
(* No need to estimate those from the data *)
α = 0.01;  (* Proportion of observations expected to be outside the envelope *)
{a, d, b} = {1, 
  InverseCDF[NormalDistribution[0, 1], 1 - α/2]^2, 0}

(* Plot the time series and an envelope containing the central 100(1-α)% of the values
   for each time step *)
Show[ListPlot[ManyRandomWalks, Joined -> True, PlotStyle -> Thin],
 Plot[{a + Sqrt[d t] + b t, a - Sqrt[d t] + b t}, {t, 0, n}, PlotStyle -> {{Thickness[0.01], Red}}]]

Multiple time series and envelope containing 99% of the distribution for each time step

I've connected the points as a reminder that there are not 100*200 independent points for which to subject to a regression.


Suppose one had a bunch of time series with no specifics about the data generation mechanism but did know that each time series was generated independently from the others. (Yes, that's using a somewhat loose interpretation of independence.)

Further one wants to estimate an envelope that would contain the central $100(1-\alpha)$% of the observations for each time step such that the envelope has the following functional form:

$$f(t)=A\pm \sqrt{D t}+B t$$

One could find the sample estimates of the $1-\alpha/2$% and $\alpha/2$% quantiles for each time step and then fit a regression to get estimates of the parameters $A$, $B$, and $D$.

Using the OP's simulated data (ManyRandomWalks) we create a dataset that has the time step and a $-1$ associated with the lower quantiles and $+1$ associated with the upper quantiles and then run NonlinearModelFit.

α = 0.05;
lower = Quantile[#, α/2] & /@ Transpose[ManyRandomWalks];
upper = Quantile[#, 1 - α/2] & /@ Transpose[ManyRandomWalks];
data = Transpose[{Join[Range[n], Range[n]],
    Join[ConstantArray[-1, n], ConstantArray[1, n]],
    Join[lower, upper]}];
nlm = NonlinearModelFit[data, a + p Sqrt[d t] + b t, {a, b, d}, {t, p}];
nlm["BestFitParameters"]
(* {a -> 0.227842, b -> 0.0196022, d -> 4.00303} *)

Now plot everything:

Show[ListPlot[ManyRandomWalks, Joined -> True, PlotStyle -> Thin],
 Plot[{a + Sqrt[d t] + b t, a - Sqrt[d t] + b t} /. nlm["BestFitParameters"], {t, 0, n}, 
  PlotStyle -> {{Thickness[0.01], Red}}],
 ListPlot[data[[All, {1, 3}]], PlotStyle -> {{Black, PointSize[0.01]}}]]

Time series, quantiles, and fit

Tags:

Fitting