Automatic data evaluation and elimination
Simplified your code a bit:
data1 = {{71.6`, 0.41`}, {27.2`, 4.96`}, {59.3`, 0.18`}, {46.`, 2.72`}, {42.2`, 1.06`}, {89.1`, 3.75`}, {88.6`, 1.9`}, {62.3`, 1.8`}, {35.5`, 1.84`}}
lm = Fit[data1, {1, x}, x]
(* 2.87217 - 0.0138549 x *)
Show[
ListPlot[data1, PlotStyle -> PointSize[.02], PlotRange -> All],
Plot[lm, {x, 25, 90}],
Axes -> True, GridLines -> Automatic, ImageSize -> 600,
LabelStyle -> Medium, Frame -> True,
FrameLabel -> {{"y", ""}, {"x ", ""}}
]
Using a 10% cutoff will eliminate all of your data in the first round. I'm pretty sure it's not a good idea, but if you relax the threshold to 20%, you can see some iterative point elimination:
olddata = {};
newdata = data1;
While[Length[olddata] != Length[newdata],
{
olddata = newdata;
lm = Fit[olddata, {1, x}, x];
newdata =
Select[data1, Abs[#[[2]] - (lm /. x -> #[[1]])]/#[[2]] < .2 &];
}]
Resulting in
lm
(* 3.60545 - 0.0192488 x *)
newdata
(* {{46., 2.72}, {88.6, 1.9}} *)
Or, to just remove the two points with the largest relative difference from the model:
Calculated y's:
lm = Fit[data1, {1, x}, x];
modely = lm /. x -> data1[[All, 1]]
(* {1.88015, 2.49531, 2.05057, 2.23484, 2.28749, 1.63769, 1.64462, 2.009, 2.38032} *)
Relative difference between experimental and calculated y's:
diffs = Abs[data1[[All, 2]] - modely]/data1[[All, 2]]
(* {3.58574, 0.496913, 10.3921, 0.178368, 1.15801, 0.563282, 0.13441, 0.116114, 0.293651} *)
Keep all but the worst two (this reorders the points):
Sort[Transpose[{data1, diffs}], #1[[2]] > #2[[2]] &][[3 ;;]][[All, 1]]
(* {{42.2, 1.06}, {89.1, 3.75}, {27.2, 4.96}, {35.5, 1.84}, {46., 2.72}, {88.6, 1.9}, {62.3, 1.8}} *)
As others have said, the original data is too far off a linear fit to have any points survive your constraint. I think this function will do what you want:
TrimDataWithLinearFit[{data_, relativeAbsDifferenceLimit_,
absoluteDifferenceLimit_}] :=
Block[{linearFit = Fit[data, {1, x}, x], newData},
newData = Append[#, linearFit /. x -> #[[1]]] & /@ data;
{Take[#, 2] & /@
Select[newData, (Abs[#[[2]] - #[[3]]] <
absoluteDifferenceLimit) && (Abs[#[[2]]/#[[3]] - 1] <
relativeAbsDifferenceLimit) &], relativeAbsDifferenceLimit,
absoluteDifferenceLimit}
]
Now we may use this to iterate until the criteria is met (using 20% and 1.5 as the criteria):
NestWhile[TrimDataWithLinearFit, {data1, .2, 1.5}, Equal, 2]
The first value returned is the list that survives repeated applications of the criteria.
EDIT:
I guess I wasn't clear that you have to take the first item from the NetwWhile operation. Here is an example along with a plot:
data2 = NestWhile[TrimDataWithLinearFit, {data1, 0.02, 0.02}, Equal,
2];
linearFit = Fit[data2[[1]], {1, x, x^2, x^3}, x]
(* -8.97342 + 0.314186 x + 0.632409 x^2 - 0.0366662 x^3 *)
Show[ListPlot[data2[[1]], PlotStyle -> PointSize[.02],
PlotRange -> All], Plot[linearFit, {x, 0, 10}], Axes -> True,
GridLines -> Automatic, ImageSize -> 600, LabelStyle -> Medium,
Frame -> True, FrameLabel -> {{"y", ""}, {"x ", ""}}]
LinearModelFit - Influence Measures
You can use LinearModelFit
and make use of the properties "CookDistances"
, "FitDifferences"
or "SingleDeletionVariances"
to identify influential observations.
(See LinearModelFit >> Scope >> Properties >> Influence Measures
.)
data = {{71.6, 0.41}, {27.2, 4.96}, {59.3, 0.18}, {46., 2.72}, {42.2,
1.06}, {89.1, 3.75}, {88.6, 1.9}, {62.3, 1.8}, {35.5, 1.84}};
lm = LinearModelFit[data, x, x];
threshold = .1;
Two observations in data
have "CookDistances" that exceed threshold
:
removed = data[[Flatten @ Position[UnitStep[lm["CookDistances"] - threshold ], 1]]]
{{27.2, 4.96}, {89.1, 3.75}}
Row[{ListPlot[data,
PlotStyle -> Directive[PointSize[Medium], Opacity[1, Black]],
Epilog -> {First@Plot[lm[x], {x, 0, 100}], Opacity[.5], Red,
PointSize[Large], Point[removed]},
Frame -> True, ImageSize -> 400, PlotLabel -> lm[x]],
ListPlot[lm["CookDistances"], Frame -> True, PlotRange -> All,
ImageSize -> 400, PlotLabel -> "CookDistances", Filling -> 0]},
Spacer[10]]
Row[ListPlot[lm[#], Frame -> True, PlotRange -> All, ImageSize -> 300,
PlotLabel -> #, Filling -> 0] & /@ {"CookDistances",
"FitDifferences", "SingleDeletionVariances"}, Spacer[10]]
Alternative influence measures:
Row[ListPlot[lm[#], Frame -> True, PlotRange -> All, ImageSize -> 300,
PlotLabel -> #, Filling -> 0] & /@
{"CookDistances", "FitDifferences", "SingleDeletionVariances"}, Spacer[10]]
Successive removal of influential observations:
A function that removes elements with specified influence measure above the specified threshold:
triM[measure_: "CookDistances", threshold_: .1] := #[[Flatten[
Position[UnitStep[LinearModelFit[#, x, x][measure] - threshold], 0]]]] &
triM["relativeError", threshold_: .1] := #[[Flatten[
Position[UnitStep[LinearModelFit[#, x, x]["FitResiduals"]/#[[All, -1]] -
threshold], 0]]]] &
Use triM
with NestWhileList
until triM
cannot eliminate additional elements or the length of the input data is 3:
remainingData = NestWhileList[triM["relativeError"], data,
(UnsameQ[##] && Length[#2] > 3) &, 2];
fits = LinearModelFit[#, x, x] & /@ remainingData;
Show[Plot[Evaluate[Through[fits[x]]], {x, 20, 100}, Frame -> True,
Axes -> False, ImageSize -> Large, PlotStyle -> {Red, Blue, Green},
PlotLegends -> MapThread[Row[{##}, " : "] &, {{"step 1", "step 2", "step 3"},
Through[fits[x]]}]],
ListPlot[BlockMap[Apply[Complement], Append[remainingData, {}], 2, 1],
PlotStyle -> Thread[Directive[AbsolutePointSize[7], {Red, Blue, Green}]],
PlotLegends -> {"removed in step 1", "removed in step 2", "remains"},
BaseStyle -> Opacity[1]],
PlotRange -> All]
If we replace triM["relativeError"]
with triM[]
in the above code to get the results for "CookDistances"
: