How to factorize specific columns in a data.frame in R using apply
The result of apply
is a vector or array or list of values (see ?apply
).
For your problem, you should use lapply
instead:
data(iris)
iris[, 2:3] <- lapply(iris[, 2:3], as.factor)
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ...
$ Petal.Length: Factor w/ 43 levels "1","1.1","1.2",..: 5 5 4 6 5 8 5 6 5 6 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Notice that this is one place where lapply
will be much faster than a for
loop. In general a loop and lapply will have similar performance, but the <-.data.frame
operation is very slow. By using lapply
one avoids the <-
operation in each iteration, and replaces it with a single assign. This is much faster.
That is because apply() works completely different. It will first carry out the function as.factor in a local environment, collect the results from that, and then try to merge them in to an array and not a dataframe. This array is in your case a matrix. R meets different factors and has no other way to cbind them than to convert them to character first. That character matrix is used to fill up your dataframe.
You can use lapply for that (see Andrie's answer) or colwise from the plyr function.
require(plyr)
Df[,ids] <- colwise(as.factor)(Df[,ids])