Curve fitting by running through hundreds of models and return the one with best fit
vars = {w, x, y, z};
terms = MonomialList[(Plus @@ vars)^3] /. _Integer x_ :> x;
cols = Join @@ {vars, terms}
(* {w,x,y,z,w^3,w^2 x,w^2 y,w^2 z,w x^2,w x y,w x z,w y^2,
w y z,w z^2,x^3,x^2 y,x^2 z,x y^2,x y z,x z^2,y^3,y^2 z,y z^2,z^3} *)
For the data
dt = Table[Join[RandomInteger[10, 4], {RandomReal[]}], {100}];
evaluate all models with up to three covariates from the set cols
and get the goodness-of-fit-measures "AIC", "BIC", "AdjustedRSquared", "AICc", "RSquared"
for each model:
models = Table[Join[{j}, LinearModelFit[dt, j, vars][{"AIC", "BIC",
"AdjustedRSquared", "AICc", "RSquared"}]],
{j, Subsets[cols, 3]}];
Length@models
(* 2325 *)
Display the top 10:
Grid[{{"Model", "BestFit", "AIC", "BIC", "AdjustedRSquared", "AICc",
"RSquared"}, ## & @@ SortBy[models, #[[3]] &][[;; 10]]},
Dividers -> All]
See also: LogitFitModel >> Scope >> Properties >> Goodness-of-Fit Measures for a great example.
Update: A single function combining the necessary steps:
modelsF = Table[Join[{j}, LinearModelFit[#, j, #2][{"BestFit", "AIC",
"BIC", "AdjustedRSquared", "AICc", "RSquared"}]], {j, Subsets[#3, #4]}] &;
and another function for showing the results:
resultsF = Grid[{{"BestFit", "Model", "AIC", "BIC",
"AdjustedRSquared", "AICc", "RSquared"},
## & @@ SortBy[#, #[[3]] &][[;; #2]]}, Dividers -> All] &;
Using the OP's example data to find the best 10 models with 3 covariates:
data = {{0, 0}, {1.5, 10.47}, {4.8, 16.31}, {9, 20.75}, {14.1, 23.81}, {22.6, 26.28},
{32.1, 27.96}, {41.3, 29.94}, {53.8, 34.68}, {64.8, 40.22},
{75, 47.04}, {82, 53.48}, {87.8, 60.15}, {91.8, 67.75}, {95.1, 76.09},
{97, 83.97}, {98, 90}, {99, 100}};
cols = {1, x, x^2, x^3, x^4, Sin[x], Cos[x], 1/(.001 + x), 1/(.001 + x^2),
1/(.001 + x^3), IntegerPart[x], PrimePi[Round[x]]};
models = modelsF[data, {x}, cols, {3}];
resultsF[models, 10]
Note: a better way to generate a richer set of covariates would be
cols = MonomialList[(1 + Plus @@ vars)^3] /. _Integer x_ :> x; (*thanks: @BobHanlon *)
Another note: All the above is essentially a brute-force approach. More rigorous and and smarter methods must exist for model selection.
In version 10.2 there is a new experimental function which might be what you are looking for: FindFormula
.
I suspect that a genetic programming algorithm (symbolic regression) is behind this new feature.
See also my question here: What is behind experimental function: FindFormula?