Create new dummy variable columns from categorical variable

Drew, this is much faster and shouldn't cause any crashes.

> binom <- data.frame(data=runif(1e5),type=sample(0:4,1e5,TRUE))
> for(t in unique(binom$type)) {
+   binom[paste("type",t,sep="")] <- ifelse(binom$type==t,1,0)
+ }
> head(binom)
        data type type2 type4 type1 type3 type0
1 0.11787309    2     1     0     0     0     0
2 0.11884046    4     0     1     0     0     0
3 0.92234950    4     0     1     0     0     0
4 0.44759259    1     0     0     1     0     0
5 0.01669651    2     1     0     0     0     0
6 0.33966184    3     0     0     0     1     0

R has a "sub-language" to translate formulas into design matrix, and in the spirit of the language you can take advantage of it. It's fast and concise. Example: you have a cardinal predictor x, a categorical predictor catVar, and a response y.

> binom <- data.frame(y=runif(1e5), x=runif(1e5), catVar=as.factor(sample(0:4,1e5,TRUE)))
> head(binom)
          y          x catVar
1 0.5051653 0.34888390      2
2 0.4868774 0.85005067      2
3 0.3324482 0.58467798      2
4 0.2966733 0.05510749      3
5 0.5695851 0.96237936      1
6 0.8358417 0.06367418      2

You just do

> A <- model.matrix(y ~ x + catVar,binom) 
> head(A)
  (Intercept)          x catVar1 catVar2 catVar3 catVar4
1           1 0.34888390       0       1       0       0
2           1 0.85005067       0       1       0       0
3           1 0.58467798       0       1       0       0
4           1 0.05510749       0       0       1       0
5           1 0.96237936       1       0       0       0
6           1 0.06367418       0       1       0       0

Done.

Tags:

R