Is there a simple command to do leave-one-out cross validation with the lm() function?
Another solution is using caret
library(caret)
data <- data.frame(x = rnorm(1000, 3, 2), y = 2*x + rnorm(1000))
train(y ~ x, method = "lm", data = data, trControl = trainControl(method = "LOOCV"))
Linear Regression
1000 samples 1 predictor
No pre-processing Resampling: Leave-One-Out Cross-Validation Summary of sample sizes: 999, 999, 999, 999, 999, 999, ... Resampling results:
RMSE Rsquared MAE
1.050268 0.940619 0.836808Tuning parameter 'intercept' was held constant at a value of TRUE
You can just use a custom function using a statistical trick that avoids actually computing all the N models:
loocv=function(fit){
h=lm.influence(fit)$h
mean((residuals(fit)/(1-h))^2)
}
This is explained in here: https://gerardnico.com/wiki/lang/r/cross_validation It only works with linear models And I guess you might want to add a square root after the mean in the formula.
You can try cv.lm
from the DAAG package:
cv.lm(data = DAAG::houseprices, form.lm = formula(sale.price ~ area),
m = 3, dots = FALSE, seed = 29, plotit = c("Observed","Residual"),
main="Small symbols show cross-validation predicted values",
legend.pos="topleft", printit = TRUE)
Arguments
data a data frame
form.lm, a formula or lm call or lm object
m the number of folds
dots uses pch=16 for the plotting character
seed random number generator seed
plotit This can be one of the text strings "Observed", "Residual", or a logical value. The logical TRUE is equivalent to "Observed", while FALSE is equivalent to "" (no plot)
main main title for graph
legend.pos position of legend: one of "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center".
printit if TRUE, output is printed to the screen