ddply with lm() function
What Ramnath explanted is exactly right. But I'll elaborate a bit.
ddply
expects a data frame in and then returns a data frame out. The lm()
function takes a data frame as an input but returns a linear model object in return. You can see that by looking at the docs for lm via ?lm
:
Value
lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm").
So you can't just shove the lm objects into a data frame. Your choices are either to coerce the output of lm
into a data frame or you can shove the lm objects into a list instead of a data frame.
So to illustrate both options:
Here's how to shove the lm objects into a list (very much like what Ramnath illustrated):
outlist <- dlply(mydf, "x3", function(df) lm(y ~ x1 + x2, data=df))
On the flip side, if you want to extract only the coefficients you can create a function that runs the regression and then returns only the coefficients in the form of a data frame like this:
myLm <- function( formula, df ){
lmList <- lm(formula, data=df)
lmOut <- data.frame(t(lmList$coefficients))
names(lmOut) <- c("intercept","x1coef","x2coef")
return(lmOut)
}
outDf <- ddply(mydf, "x3", function(df) myLm(y ~ x1 + x2, df))
Here is what you need to do.
mods = dlply(mydf, .(x3), lm, formula = y ~ x1 + x2)
mods is a list of two objects containing the regression results. you can extract what you need from mods. for example, if you want to extract the coefficients, you could write
coefs = ldply(mods, coef)
This gives you
x3 (Intercept) x1 x2
1 1 11.71015 -0.3193146 NA
2 2 21.83969 -1.4677690 NA
EDIT. If you want ANOVA
, then you can just do
ldply(mods, anova)
x3 Df Sum Sq Mean Sq F value Pr(>F)
1 1 1 2.039237 2.039237 0.4450663 0.52345980
2 1 8 36.654982 4.581873 NA NA
3 2 1 43.086916 43.086916 4.4273907 0.06849533
4 2 8 77.855187 9.731898 NA NA
Use this
mods <- dlply(mydf, .(x3), lm, formula = y ~ x1 + x2)
coefs <- llply(mods, coef)
$`1`
(Intercept) x1 x2
11.7101519 -0.3193146 NA
$`2`
(Intercept) x1 x2
21.839687 -1.467769 NA
anovas <- llply(mods, anova)
$`1`
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 2.039 2.0392 0.4451 0.5235
Residuals 8 36.655 4.5819
$`2`
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 43.087 43.087 4.4274 0.0685 .
Residuals 8 77.855 9.732
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1