Convert all data frame character columns to factors
As @Raf Z commented on this question, dplyr now has mutate_if. Super useful, simple and readable.
> str(df)
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: chr "a" "b" "c" "d" ...
$ E: chr "A a" "B b" "C c" "D d" ...
> df <- df %>% mutate_if(is.character,as.factor)
> str(df)
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
$ E: Factor w/ 5 levels "A a","B b","C c",..: 1 2 3 4 5
Roland's answer is great for this specific problem, but I thought I would share a more generalized approach.
DF <- data.frame(x = letters[1:5], y = 1:5, z = LETTERS[1:5],
stringsAsFactors=FALSE)
str(DF)
# 'data.frame': 5 obs. of 3 variables:
# $ x: chr "a" "b" "c" "d" ...
# $ y: int 1 2 3 4 5
# $ z: chr "A" "B" "C" "D" ...
## The conversion
DF[sapply(DF, is.character)] <- lapply(DF[sapply(DF, is.character)],
as.factor)
str(DF)
# 'data.frame': 5 obs. of 3 variables:
# $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# $ y: int 1 2 3 4 5
# $ z: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
For the conversion, the left hand side of the assign (DF[sapply(DF, is.character)]
) subsets the columns that are character. In the right hand side, for that subset, you use lapply
to perform whatever conversion you need to do. R is smart enough to replace the original columns with the results.
The handy thing about this is if you wanted to go the other way or do other conversions, it's as simple as changing what you're looking for on the left and specifying what you want to change it to on the right.
DF <- data.frame(x=letters[1:5], y=1:5, stringsAsFactors=FALSE)
str(DF)
#'data.frame': 5 obs. of 2 variables:
# $ x: chr "a" "b" "c" "d" ...
# $ y: int 1 2 3 4 5
You can use as.data.frame
to turn all character columns into factor columns:
DF <- as.data.frame(unclass(DF),stringsAsFactors=TRUE)
str(DF)
#'data.frame': 5 obs. of 2 variables:
# $ x: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# $ y: int 1 2 3 4 5
Working with dplyr
library(dplyr)
df <- data.frame(A = factor(LETTERS[1:5]),
B = 1:5, C = as.logical(c(1, 1, 0, 0, 1)),
D = letters[1:5],
E = paste(LETTERS[1:5], letters[1:5]),
stringsAsFactors = FALSE)
str(df)
we get:
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: chr "a" "b" "c" "d" ...
$ E: chr "A a" "B b" "C c" "D d" ...
Now, we can convert all chr
to factors
:
df <- df%>%mutate_if(is.character, as.factor)
str(df)
And we get:
'data.frame': 5 obs. of 5 variables:
$ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
$ B: int 1 2 3 4 5
$ C: logi TRUE TRUE FALSE FALSE TRUE
$ D: chr "a" "b" "c" "d" ...
$ E: chr "A a" "B b" "C c" "D d" ...
Let's provide also other solutions:
With base package:
df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)],
as.factor)
With dplyr
1.0.0
df <- df%>%mutate(across(where(is.factor), as.character))
With purrr
package:
library(purrr)
df <- df%>% modify_if(is.factor, as.character)