Predict() - Maybe I'm not understanding it
First, you want to use
model <- lm(Total ~ Coupon, data=df)
not model <-lm(df$Total ~ df$Coupon, data=df)
.
Second, by saying lm(Total ~ Coupon)
, you are fitting a model that uses Total
as the response variable, with Coupon
as the predictor. That is, your model is of the form Total = a + b*Coupon
, with a
and b
the coefficients to be estimated. Note that the response goes on the left side of the ~
, and the predictor(s) on the right.
Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon
, not Total
.
Third, judging by your specification of newdata
, it looks like you're actually after a model to fit Coupon
as a function of Total
, not the other way around. To do this:
model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)
Thanks Hong, that was exactly the problem I was running into. The error you get suggests that the number of rows is wrong, but the problem is actually that the model has been trained using a command that ends up with the wrong names for parameters.
This is really a critical detail that is entirely non-obvious for lm and so on. Some of the tutorial make reference to doing lines like lm(olive$Area@olive$Palmitic)
- ending up with variable names of olive$Area NOT Area, so creating an entry using anewdata<-data.frame(Palmitic=2)
can't then be used. If you use lm(Area@Palmitic,data=olive)
then the variable names are right and prediction works.
The real problem is that the error message does not indicate the problem at all:
Warning message: 'anewdata' had 1 rows but variable(s) found to have X rows
instead of newdata you are using newdate in your predict code, verify once. and just use Coupon$estimate <- predict(model, Coupon)
It will work.