Levels function returning NULL
You can only run levels
on a factor vector, not on a data frame.
Example below
> df <- data.frame(a = factor(c('a','b','c'), levels = c('a','b','c','d','e')),
+ b = factor(c('a','b','c')),
+ c = factor(c('a','a','c')))
> levels(df)
NULL
To see the level of every column in your data frame, you can use lapply
> lapply(df, levels)
$a
[1] "a" "b" "c" "d" "e"
$b
[1] "a" "b" "c"
$c
[1] "a" "c"
If you want the levels of a specific column, you can specify that instead:
> levels(df[, 2])
[1] "a" "b" "c"
EDIT: To answer question below on why apply(df, 2, levels)
returns NULL
.
Note the following from the documentation for apply()
:
In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.
You can see this behavior when you try to take the class, and try a few other functions.
> apply(df, 2, levels)
NULL
> apply(df, 2, class)
a b c
"character" "character" "character"
> apply(df, 2, function(i) levels(i))
NULL
> apply(df, 2, function(i) levels(factor(i)))
$`a`
[1] "a" "b" "c"
$b
[1] "a" "b" "c"
$c
[1] "a" "c"
Note that even though we can force apply()
to treat the columns as factors, we lose the prior ordering/levels that were set for df
when it was originally created (see column `a`
). This is because it has been coerced into a character vector.
When initializing a dataframe, pass stringsAsFactors = T in the initialization
eg. dataFrame <- read.csv(file.choose(), stringsAsFactors=T)
this makes R treat the string values as factors. Hope it helped