Simultaneously fitting multiple datasets

This is an extension of Heike's answer to address the question of error estimates. I'll follow the book Data Analysis: A Bayesian Tutorial by D.S. Sivia and J. Skilling (Oxford University Press).

Basically, any error estimate depends on the basic assumptions you make. The previous answers implicitly assume uniform normally distributed noise: $\epsilon \sim N(0, \sigma)$. If you know $\sigma$ the error estimate is straightforward.

With the same definitions:

data1 = Table[{x, RandomReal[{-.1, .1}] + f[x, 1, 1, 1]}, {x, -4, 6, 0.25}];
data2 = Table[{x, RandomReal[{-.1, .1}] + f[x, .5, 1, 2]}, {x, -8, 10, 0.5}];
f[x_, amplitude_, centroid_, sigma_] := amplitude Exp[-((x - centroid)^2/sigma^2)]

Add the variables:

vars = {mu, au1, s1, au2, s2};

The variance of the error is (analytically, from the definition above):

noiseVariance = Integrate [x^2, {x, -0.1, 0.1}];

The log-likelihood of the model is:

logModel = -Total[ (data1[[All, 2]] - (f[#, au1, mu, s1] & /@ 
       data1[[All, 1]]) )^2 /noiseVariance]/2 - 
             Total[ (data2[[All, 2]] - (f[#, au2, mu, s2] & /@ 
       data2[[All, 1]]) )^2 /noiseVariance]/2;

Optimize the log-likelihood (note the change of sign leading to a maximization instead of minimization)

fit = FindMaximum[logModel, vars]

The fit will be the same, as the variance estimation doesn't affect the maximum, so I won't repeat it here.

For the error estimates, the covariance matrix is found as minus the inverse of the hessian of the log-likelihood function, so (DA p.50):

$$ \sigma_{ij}^2 = -[\nabla \nabla L]^{-1}_{ij} $$

hessianL = D[logModel {vars, 2}];
parameterStdDeviations =  Sqrt[- Diagonal@Inverse@hessianL];
{vars,  #1 \[PlusMinus] #2 & @@@ ({vars /. fit[[2]], 
   parameterStdDeviations}\[Transpose]) }\[Transpose] // TableForm

If $\sigma$ is unknown the analysis is slightly trickier, but the results are easily implemented. If the error is additive guassian noise of unknown variance the correct estimator is (DA p. 67):

$$ s^2 = \frac{1}{N-1} \sum_{k=1}^N (data_k - f[x_k; model])^2 $$

estimatedVariance1 = Total[(data1[[All, 2]] - (f[#, au1, mu, s1] & /@ 
       data1[[All, 1]]) )^2] / (Length@data2 - 1);
estimatedVariance2 = Total[(data2[[All, 2]] - (f[#, au2, mu, s2] & /@ 
       data2[[All, 1]]) )^2] / (Length@data2 - 1);

As stated above the magnitude of the variance won't affect our point estimates in the model, so we can use the same code above, and just inject the newly estimated variance into the log-likelihood function. This seems to be equivalent to the default behaviour of NonlinearModelFit.

As you seem to indicate that you are fitting spectra from a counting experiment, you might have better performance if you assume Poisson counting noise instead, then the variance for each channel is estimated as the number of counts in that channel: $$ \sigma^2_k \approx data_k $$ You might also want to consider adding a background model (a constant background is a simple extension of the above), depending on the noise level.

I once had to do this to fit some spectroscopic data. This was my solution...

Here is a simple model of the intensity and phase of a laser after passing through a medium with a complex refractive index

n[den_, det_] := 1 + den ((I - 2 det)/(1 + 4 det^2));
int[den_, det_] := Exp[-2 Arg[n[den, det]] den];
phase[den_, det_] := Re[n[den, det]] den;

some noisy data

d1 = Table[{x, int[1.34, x] + RandomReal[{-.1, .1}]}, {x, -20, 20}];
d2 = Table[{x, phase[1.34, x] + RandomReal[{-.1, .1}]}, {x, -20, 20}];

define a dummy variable labelling the data sets, and join the data into one list

dat = Join[d1 /. {x_, y_} -> {1, x, y}, d2 /. {x_, y_} -> {2, x, y}];

define a fit function that depends on the value of the "set" variable

fitmodel[set_, den_, det_] := 
 Which[set == 1, Evaluate@int[den, det], set == 2, 
  Evaluate@phase[den, det]]

now NonlinearModelFit fits both the datasets simultaneously

fit = NonlinearModelFit[dat, 
   fitmodel[set, den, det], {{den, 2}}, {set, det}];
fitparams = fit["BestFitParameters"]


dd = {d1, d2};
mm = {int[den, det], phase[den, det]};
Show[
 ListPlot[dd, PlotRange -> All],
 Plot[Evaluate[mm /. fitparams], {det, -20, 20}]
 ]

You could use NMinimize to fit the two models. For example with

data1 = Table[{x, RandomReal[{-.1, .1}] + f[x, 1, 1, 1]}, {x, -4, 6, 0.25}];
data2 = Table[{x, RandomReal[{-.1, .1}] + f[x, .5, 1, 2]}, {x, -8, 10, 0.5}];
f[x_, amplitude_, centroid_, sigma_] := amplitude Exp[-((x - centroid)^2/sigma^2)]

We could find a least squares solution by doing something like

min = NMinimize[Total[(#.#) & /@ 
   {data1[[All, 2]] - (f[#, a1, mu, s1] & /@ data1[[All, 1]]), 
    data2[[All, 2]] - (f[#, a2, mu, s2] & /@ data2[[All, 1]])}
  ], {a1, a2, s1, s2, mu}]

(*
 ==> {0.253998, {a1 -> 0.984464, a2 -> 0.451312, s1 -> 0.980629, 
  s2 -> -2.07535, mu -> 0.988739}}
*)

To compare the found fit with the data

datpl = ListPlot[{data1, data2}, Joined -> True, 
   PlotRange -> {{-10, 10}, All}, Frame -> True, 
   PlotStyle -> {Black, Red}, Axes -> False, 
   InterpolationOrder -> 0];

Show[datpl, Plot[Evaluate[{f[x, a1, mu, s1], f[x, a2, mu, s2]} /. min[[2]]], 
  {x, -10, 10}, 
  PlotRange -> All, PlotStyle -> {Black, Red}]]

Mathematica graphics

Simultaneously fitting multiple datasets

Tags:

Probability Or Statistics

Fitting

Related

Recent Posts