Convert type of multiple columns of a dataframe at once
I find I run into this a lot as well. This is about how you import data. All of the read...() functions have some type of option to specify not converting character strings to a factor. Meaning that text strings will stay character and things that look like numbers will stay as numbers. A problem arises when you have elements that are empty and not NA. But again, na.strings = c("",...) should solve that as well. I'd start by taking a hard look at your import process and adjusting it accordingly.
But you could always create a function and push this string through.
convert.magic <- function(x, y=NA) {
for(i in 1:length(y)) {
if (y[i] == "numeric") {
x[i] <- as.numeric(x[[i]])
}
if (y[i] == "character")
x[i] <- as.character(x[[i]])
}
return(x)
}
foo <- convert.magic(foo, c("character", "character", "numeric"))
> str(foo)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
Edit See this related question for some simplifications and extensions on this basic idea.
My comment to Brandon's answer using switch
:
convert.magic <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor)
obj[,i] <- FUN(obj[,i])
}
obj
}
out <- convert.magic(foo,c('character','character','numeric'))
> str(out)
'data.frame': 10 obs. of 3 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: chr "red" "red" "red" "blue" ...
$ z: num 15254 15255 15256 15257 15258 ...
For truly large data frames you may want to use lapply
instead of the for
loop:
convert.magic1 <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out,stringsAsFactors = FALSE)
}
When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...))
. Also, be aware of data.frame()
and as.data.frame()
s default behavior of converting character to factor.
I know I am quite late to answer, but using a loop along with the attributes function is a simple solution to your problem.
names <- c("x", "y", "z")
chclass <- c("character", "character", "numeric")
for (i in (1:length(names))) {
attributes(foo[, names[i]])$class <- chclass[i]
}
If you want to automatically detect the columns data-type rather than manually specify it (e.g. after data-tidying, etc.), the function type.convert()
may help.
The function type.convert()
takes in a character vector and attempts to determine the optimal type for all elements (meaning that it has to be applied once per column).
df[] <- lapply(df, function(x) type.convert(as.character(x)))
Since I love dplyr
, I prefer:
library(dplyr)
df <- df %>% mutate_all(funs(type.convert(as.character(.))))