Curve fitting by running through hundreds of models and return the one with best fit

vars = {w, x, y, z};
terms = MonomialList[(Plus @@ vars)^3] /. _Integer x_ :> x;
cols = Join @@ {vars, terms} 
(* {w,x,y,z,w^3,w^2 x,w^2 y,w^2 z,w x^2,w x y,w x z,w y^2,
    w y z,w z^2,x^3,x^2 y,x^2 z,x y^2,x y z,x z^2,y^3,y^2 z,y z^2,z^3} *)

For the data

dt = Table[Join[RandomInteger[10, 4], {RandomReal[]}], {100}];

evaluate all models with up to three covariates from the set cols and get the goodness-of-fit-measures "AIC", "BIC", "AdjustedRSquared", "AICc", "RSquared" for each model:

models = Table[Join[{j}, LinearModelFit[dt, j, vars][{"AIC", "BIC", 
                               "AdjustedRSquared", "AICc", "RSquared"}]],
               {j, Subsets[cols, 3]}];
Length@models
(* 2325 *)

Display the top 10:

Grid[{{"Model", "BestFit", "AIC", "BIC", "AdjustedRSquared", "AICc", 
  "RSquared"}, ## & @@ SortBy[models, #[[3]] &][[;; 10]]}, 
   Dividers -> All]

enter image description here

See also: LogitFitModel >> Scope >> Properties >> Goodness-of-Fit Measures for a great example.

Update: A single function combining the necessary steps:

modelsF = Table[Join[{j}, LinearModelFit[#, j, #2][{"BestFit", "AIC", 
  "BIC", "AdjustedRSquared", "AICc", "RSquared"}]], {j, Subsets[#3, #4]}] &;

and another function for showing the results:

resultsF = Grid[{{"BestFit", "Model", "AIC", "BIC", 
  "AdjustedRSquared", "AICc", "RSquared"},
   ## & @@ SortBy[#, #[[3]] &][[;; #2]]}, Dividers -> All] &;

Using the OP's example data to find the best 10 models with 3 covariates:

data = {{0, 0}, {1.5, 10.47}, {4.8, 16.31}, {9, 20.75}, {14.1, 23.81}, {22.6, 26.28},
        {32.1, 27.96}, {41.3, 29.94}, {53.8, 34.68}, {64.8, 40.22}, 
        {75, 47.04}, {82, 53.48}, {87.8, 60.15}, {91.8, 67.75}, {95.1, 76.09},
        {97, 83.97}, {98, 90}, {99, 100}};
cols = {1, x, x^2, x^3, x^4, Sin[x], Cos[x], 1/(.001 + x), 1/(.001 + x^2),
        1/(.001 + x^3), IntegerPart[x], PrimePi[Round[x]]};

models = modelsF[data, {x}, cols, {3}];
resultsF[models, 10]

enter image description here

Note: a better way to generate a richer set of covariates would be

cols = MonomialList[(1 + Plus @@ vars)^3] /. _Integer x_ :> x; (*thanks: @BobHanlon *)

Another note: All the above is essentially a brute-force approach. More rigorous and and smarter methods must exist for model selection.

In version 10.2 there is a new experimental function which might be what you are looking for: FindFormula.

I suspect that a genetic programming algorithm (symbolic regression) is behind this new feature.

See also my question here: What is behind experimental function: FindFormula?

Curve fitting by running through hundreds of models and return the one with best fit

Tags:

Probability Or Statistics

Fitting

Related

Recent Posts