Convert data.frame columns from factors to characters
To replace only factors:
i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)
In package dplyr in version 0.5.0 new function mutate_if
was introduced:
library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob
...and in version 1.0.0 was replaced by across
:
library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob
Package purrr from RStudio gives another alternative:
library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob
Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.
As @hadley points out, the following is more concise.
bob[] <- lapply(bob, as.character)
In both cases, lapply
outputs a list; however, owing to the magical properties of R, the use of []
in the second case keeps the data.frame class of the bob
object, thereby eliminating the need to convert back to a data.frame using as.data.frame
with the argument stringsAsFactors = FALSE
.
If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.
Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:
> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d
> as.numeric(fact)
[1] 1 2 1 3
The numbers returned in the last line correspond to the levels of the factor.
> levels(fact)
[1] "a" "b" "d"
Notice that levels()
returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:
> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"
This also works for numeric values, provided you wrap your expression in as.numeric()
.
> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
The global option
stringsAsFactors: The default setting for arguments of data.frame and read.table.
may be something you want to set to FALSE
in your startup files (e.g. ~/.Rprofile). Please see help(options)
.