Automatic data evaluation and elimination

Simplified your code a bit:

data1 = {{71.6`, 0.41`}, {27.2`, 4.96`}, {59.3`, 0.18`}, {46.`, 2.72`}, {42.2`, 1.06`}, {89.1`, 3.75`}, {88.6`, 1.9`}, {62.3`, 1.8`}, {35.5`, 1.84`}}

lm = Fit[data1, {1, x}, x]
 (* 2.87217 - 0.0138549 x *)

Show[
 ListPlot[data1, PlotStyle -> PointSize[.02], PlotRange -> All],
 Plot[lm, {x, 25, 90}],
 Axes -> True, GridLines -> Automatic, ImageSize -> 600, 
 LabelStyle -> Medium, Frame -> True, 
 FrameLabel -> {{"y", ""}, {"x ", ""}}
 ]

enter image description here

Using a 10% cutoff will eliminate all of your data in the first round. I'm pretty sure it's not a good idea, but if you relax the threshold to 20%, you can see some iterative point elimination:

olddata = {};
newdata = data1;
While[Length[olddata] != Length[newdata],
 {
  olddata = newdata;
  lm = Fit[olddata, {1, x}, x];
  newdata = 
   Select[data1, Abs[#[[2]] - (lm /. x -> #[[1]])]/#[[2]] < .2 &];
  }]

Resulting in

lm
 (* 3.60545 - 0.0192488 x *)

newdata
 (* {{46., 2.72}, {88.6, 1.9}} *)

Or, to just remove the two points with the largest relative difference from the model:

Calculated y's:

lm = Fit[data1, {1, x}, x];
modely = lm /. x -> data1[[All, 1]]
 (* {1.88015, 2.49531, 2.05057, 2.23484, 2.28749, 1.63769, 1.64462, 2.009, 2.38032} *)

Relative difference between experimental and calculated y's:

diffs = Abs[data1[[All, 2]] - modely]/data1[[All, 2]]
 (* {3.58574, 0.496913, 10.3921, 0.178368, 1.15801, 0.563282, 0.13441, 0.116114, 0.293651} *)

Keep all but the worst two (this reorders the points):

Sort[Transpose[{data1, diffs}], #1[[2]] > #2[[2]] &][[3 ;;]][[All, 1]]
 (* {{42.2, 1.06}, {89.1, 3.75}, {27.2, 4.96}, {35.5, 1.84}, {46., 2.72}, {88.6, 1.9}, {62.3, 1.8}} *)

As others have said, the original data is too far off a linear fit to have any points survive your constraint. I think this function will do what you want:

TrimDataWithLinearFit[{data_, relativeAbsDifferenceLimit_, 
   absoluteDifferenceLimit_}] := 
 Block[{linearFit = Fit[data, {1, x}, x], newData}, 
  newData = Append[#, linearFit /. x -> #[[1]]] & /@ data;
  {Take[#, 2] & /@ 
    Select[newData, (Abs[#[[2]] - #[[3]]] < 
         absoluteDifferenceLimit) && (Abs[#[[2]]/#[[3]] - 1] < 
         relativeAbsDifferenceLimit) &], relativeAbsDifferenceLimit, 
   absoluteDifferenceLimit}
  ]

Now we may use this to iterate until the criteria is met (using 20% and 1.5 as the criteria):

NestWhile[TrimDataWithLinearFit, {data1, .2, 1.5}, Equal, 2]

The first value returned is the list that survives repeated applications of the criteria.

EDIT:
I guess I wasn't clear that you have to take the first item from the NetwWhile operation. Here is an example along with a plot:

data2 = NestWhile[TrimDataWithLinearFit, {data1, 0.02, 0.02}, Equal, 
  2];
linearFit = Fit[data2[[1]], {1, x, x^2, x^3}, x]
(* -8.97342 + 0.314186 x + 0.632409 x^2 - 0.0366662 x^3 *)

Show[ListPlot[data2[[1]], PlotStyle -> PointSize[.02], 
  PlotRange -> All], Plot[linearFit, {x, 0, 10}], Axes -> True, 
 GridLines -> Automatic, ImageSize -> 600, LabelStyle -> Medium, 
 Frame -> True, FrameLabel -> {{"y", ""}, {"x ", ""}}]

enter image description here

LinearModelFit - Influence Measures

You can use LinearModelFit and make use of the properties "CookDistances", "FitDifferences" or "SingleDeletionVariances" to identify influential observations.

(See LinearModelFit >> Scope >> Properties >> Influence Measures.)

data = {{71.6, 0.41}, {27.2, 4.96}, {59.3, 0.18}, {46., 2.72}, {42.2, 
    1.06}, {89.1, 3.75}, {88.6, 1.9}, {62.3, 1.8}, {35.5, 1.84}};
lm = LinearModelFit[data, x, x];

threshold = .1;

Two observations in data have "CookDistances" that exceed threshold:

removed = data[[Flatten @ Position[UnitStep[lm["CookDistances"] - threshold ], 1]]]

{{27.2, 4.96}, {89.1, 3.75}}

Row[{ListPlot[data, 
   PlotStyle -> Directive[PointSize[Medium], Opacity[1, Black]], 
   Epilog -> {First@Plot[lm[x], {x, 0, 100}], Opacity[.5], Red, 
     PointSize[Large], Point[removed]}, 
   Frame -> True, ImageSize -> 400, PlotLabel -> lm[x]], 
  ListPlot[lm["CookDistances"], Frame -> True, PlotRange -> All, 
   ImageSize -> 400, PlotLabel -> "CookDistances", Filling -> 0]}, 
 Spacer[10]]

enter image description here

Row[ListPlot[lm[#], Frame -> True, PlotRange -> All, ImageSize -> 300,
     PlotLabel -> #, Filling -> 0] & /@ {"CookDistances", 
   "FitDifferences", "SingleDeletionVariances"}, Spacer[10]]

Alternative influence measures:

Row[ListPlot[lm[#], Frame -> True, PlotRange -> All, ImageSize -> 300,
     PlotLabel -> #, Filling -> 0] & /@ 
  {"CookDistances", "FitDifferences", "SingleDeletionVariances"}, Spacer[10]]

enter image description here

Successive removal of influential observations:

A function that removes elements with specified influence measure above the specified threshold:

triM[measure_: "CookDistances", threshold_: .1] := #[[Flatten[
    Position[UnitStep[LinearModelFit[#, x, x][measure] - threshold], 0]]]] &

triM["relativeError", threshold_: .1] := #[[Flatten[
    Position[UnitStep[LinearModelFit[#, x, x]["FitResiduals"]/#[[All, -1]] - 
       threshold], 0]]]] &

Use triM with NestWhileList until triM cannot eliminate additional elements or the length of the input data is 3:

remainingData = NestWhileList[triM["relativeError"], data,
   (UnsameQ[##] && Length[#2] > 3) &, 2];

fits = LinearModelFit[#, x, x] & /@ remainingData;

Show[Plot[Evaluate[Through[fits[x]]], {x, 20, 100}, Frame -> True, 
  Axes -> False, ImageSize -> Large, PlotStyle -> {Red, Blue, Green}, 
  PlotLegends -> MapThread[Row[{##}, " :  "] &, {{"step 1", "step 2", "step 3"}, 
     Through[fits[x]]}]], 
 ListPlot[BlockMap[Apply[Complement], Append[remainingData, {}], 2, 1],
   PlotStyle -> Thread[Directive[AbsolutePointSize[7], {Red, Blue, Green}]], 
   PlotLegends -> {"removed in step 1", "removed in step 2", "remains"},
   BaseStyle -> Opacity[1]],
  PlotRange -> All]

enter image description here

If we replace triM["relativeError"] with triM[] in the above code to get the results for "CookDistances":

enter image description here

Automatic data evaluation and elimination

LinearModelFit - Influence Measures

Successive removal of influential observations:

Tags:

List Manipulation

Related

Recent Posts