How to select all factor variables in R
This (almost) appears the perfect time to use the seldom-used function rapply
rapply(insurance, class = "factor", f = levels, how = "list")
Or
Filter(Negate(is.null),rapply(insurance, class = "factor", f = levels, how = "list"))
To remove the NULL
elements (that weren't factors)
Or simply
lapply(Filter(is.factor,insurance), levels))
insurance %>% select_if(~class(.) == 'factor')
I would suggest to use dplyr and purrr here. First select the factor columns and then use purrr::map to show the factor levels for each column.
library(tidyverse)
insurance %>%
select(where(is.factor)) %>%
map(levels)
Some data:
insurance <- data.frame(
int = 1:5,
fact1 = letters[1:5],
fact2 = factor(1:5),
fact3 = LETTERS[3:7]
)
I would use sapply
like you did, but combined with is.factor
to return a logical vector:
is.fact <- sapply(insurance, is.factor)
# int fact1 fact2 fact3
# FALSE TRUE TRUE TRUE
Then use [
to extract these columns:
factors.df <- insurance[, is.fact]
# fact1 fact2 fact3
# 1 a 1 C
# 2 b 2 D
# 3 c 3 E
# 4 d 4 F
# 5 e 5 G
Finally, to get the levels, use lapply
:
lapply(factors.df, levels)
# $fact1
# [1] "a" "b" "c" "d" "e"
#
# $fact2
# [1] "1" "2" "3" "4" "5"
#
# $fact3
# [1] "C" "D" "E" "F" "G"
You might also find str(insurance)
interesting as a short summary.