All Levels of a Factor in a Model Matrix in R
caret
implemented a nice function dummyVars
to achieve this with 2 lines:
library(caret)
dmy <- dummyVars(" ~ .", data = testFrame)
testFrame2 <- data.frame(predict(dmy, newdata = testFrame))
Checking the final columns:
colnames(testFrame2)
"First" "Second" "Third" "Fourth.Alice" "Fourth.Bob" "Fourth.Charlie" "Fourth.David" "Fifth.Edward" "Fifth.Frank" "Fifth.Georgia" "Fifth.Hank" "Fifth.Isaac"
The nicest point here is you get the original data frame, plus the dummy variables having excluded the original ones used for the transformation.
More info: http://amunategui.github.io/dummyVar-Walkthrough/
(Trying to redeem myself...) In response to Jared's comment on @Fabians answer about automating it, note that all you need to supply is a named list of contrast matrices. contrasts()
takes a vector/factor and produces the contrasts matrix from it. For this then we can use lapply()
to run contrasts()
on each factor in our data set, e.g. for the testFrame
example provided:
> lapply(testFrame[,4:5], contrasts, contrasts = FALSE)
$Fourth
Alice Bob Charlie David
Alice 1 0 0 0
Bob 0 1 0 0
Charlie 0 0 1 0
David 0 0 0 1
$Fifth
Edward Frank Georgia Hank Isaac
Edward 1 0 0 0 0
Frank 0 1 0 0 0
Georgia 0 0 1 0 0
Hank 0 0 0 1 0
Isaac 0 0 0 0 1
Which slots nicely into @fabians answer:
model.matrix(~ ., data=testFrame,
contrasts.arg = lapply(testFrame[,4:5], contrasts, contrasts=FALSE))
You need to reset the contrasts
for the factor variables:
model.matrix(~ Fourth + Fifth, data=testFrame,
contrasts.arg=list(Fourth=contrasts(testFrame$Fourth, contrasts=F),
Fifth=contrasts(testFrame$Fifth, contrasts=F)))
or, with a little less typing and without the proper names:
model.matrix(~ Fourth + Fifth, data=testFrame,
contrasts.arg=list(Fourth=diag(nlevels(testFrame$Fourth)),
Fifth=diag(nlevels(testFrame$Fifth))))
dummyVars
from caret
could also be used. http://caret.r-forge.r-project.org/preprocess.html