Can one force a linear regression fit to go through an arbitrary point?

Use IncludeConstantBasis -> False to go through $(100,0)$:

Normal[LinearModelFit[data, x - 100, x, IncludeConstantBasis -> False]]
(*  -0.679766 (-100 + x)  *)

Or LinearOffsetFunction -> (100 &) for $(0, 100)$:

Normal[LinearModelFit[data, x, x, IncludeConstantBasis -> False, 
  LinearOffsetFunction -> (100 &)]]
(*  100 - 4.05 x  *)

Or both for an arbitary point:

SeedRandom[0];
pt = RandomChoice[data]  (* pick a data point *)
(*  {15, 41}  *)

Normal[LinearModelFit[data, x - pt[[1]], x, 
  IncludeConstantBasis -> False, LinearOffsetFunction -> (pt[[2]] &)]]
(*  41 - 4.02727 (-15 + x)  *)

tl;dr

Make your data go through $(0,0)$ by subtracting the required point and fit a line through the origin, then add back the required point:

datap = # - {0, 100} & /@ data;
Fit[ datap, {x}, x] + 100

Annoying maths:

You have to include the condition as an external constraint and the standard Fit[...] does not allow for that. A linear regression is simply trying to minimize the expression (assuming identical uncertainties $\sigma_k$ for ease of notation)

$$\sum_k (a x_k + b - y_k)^2 $$

wrt the parameters $a$ and $b$. Your external constraint has the form

$$x_0 a + b - y_0 = 0$$

with $x_0=0$ and $y_0 = 100$ in your special case. The constraint, being zero, can be added the above expression to minimize

$$\sum_k (a x_k + b - y_k)^2 - \lambda ( a x_0 + b - y_0 )$$

This addition will couple the (otherwise independent) derivatives wrt $a$ and $b$

$$\sum_k 2 x_k (a x_k + b - y_k) = \lambda x_0$$ $$\sum_k 2 (a x_k + b - y_k) = \lambda$$

which, together with the constraint, can be easily (and analytically!) solved for a and b. Use the constraint to express $b$

$$b = -a x_0 + y_0$$

which leads to

$$\sum_k 2 x_k (a (x_k-x_0) - (y_k-y_0) ) = \lambda x_0$$ $$\sum_k 2 (a (x_k-x_0) - (y_k-y_0) ) = \lambda$$

$$\sum_k 2 (x_k-x_0) (a (x_k-x_0) - (y_k-y_0) ) = 0$$

which is, somewhat unsurprisingly, exactly the equation you would have gotten if you had started with the dataset $x'_k = x_k-x_0, y'_k = y_k-y_0$ and the model $$y= a x$$

So you should get the required result using

datap = # - {0, 100} & /@ data;
Fit[ datap, {x}, x] + 100

which indeed results in

100 - 4.05 x

Use NonlinearModelFit (everything that is linear is also nonlinear, you know...):

model = m x + b;
model = model /. Solve[m x + b == 0 /. x -> 100, b];
f = NonlinearModelFit[data, {model}, {m}, x]
f["BestFitParameters"]

{m -> -0.679766}

Show[
 ListLinePlot[data],
 Plot[f[x], {x, 0, 100}, PlotStyle -> ColorData[97][2]],
 PlotRange -> All
 ]

enter image description here

I suppose you meant $(0,100)$ as a point to go through but you only get what you ask for ;)

Can one force a linear regression fit to go through an arbitrary point?

tl;dr

Annoying maths:

Tags:

Fitting

Related

Recent Posts