What is the difference between lm(offense$R ~ offense$OBP) and lm(R ~ OBP)?
In the first case, you get this if you print the model:
Call:
lm(formula = offense$R ~ offense$OBP)
Coefficients:
(Intercept) offense$OBP
-0.1102 0.5276
But in the second, you get this:
Call:
lm(formula = R ~ OBP)
Coefficients:
(Intercept) OBP
-0.1102 0.5276
Look at the name of the coefficients. When you create your newdata with newdata=data.frame(OBP=0.5)
, that not really make sense for the first model, so newdata is ignored and you only get the predicted values with the training data. When you use offense$R ~ offense$OBP
, the formula has just two vectors at each side, with no names associated to a data.frame
.
The best way to do it is:
obp = lm(R ~ OBP, data=offense)
predict(obp, newdata=data.frame(OBP=0.5), interval="predict")
And you'll get the proper result, the prediction for OBP=0.5
.