dplyr join warning: joining factors with different levels

In case of database, don't forget stringsAsFactors=FALSE in many cases, to avoid this warning. (It's was my case).

sqlExecute(my_database_channel, data=myparam, stringsAsFactors=FALSE )

This warning message will also appear if the joining columns in the two tables have different level orders;

tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a))
# Change level order of table tb2's col a
tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c"))

# Check both still factors
tb1$a %>% class()
[1] "factor"
tb2$a %>% class()
[1] "factor"

# Check level order
tb1$a %>% levels()
[1] "a" "b" "c"
tb2$a %>% levels()
[1] "c" "a" "b"

# Try joining
tb1 %>% left_join(tb2)
Joining, by = "a"
Column `a` joining factors with different levels, coercing to character vector

That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to lose any information, the factors were converted to character values. For example:

library(dplyr)
x<-data.frame(a=letters[1:7])
y<-data.frame(a=letters[4:10])

class(x$a) 
# [1] "factor"

# NOTE these are different
levels(x$a)
# [1] "a" "b" "c" "d" "e" "f" "g"
levels(y$a)
# [1] "d" "e" "f" "g" "h" "i" "j"

m <- left_join(x,y)
# Joining by: "a"
# Warning message:
# joining factors with different levels, coercing to character vector 

class(m$a)
# [1] "character"

You can make sure that both factors have the same levels before merging

combined <- sort(union(levels(x$a), levels(y$a)))
n <- left_join(mutate(x, a=factor(a, levels=combined)),
    mutate(y, a=factor(a, levels=combined)))
# Joining by: "a"
class(n$a)
#[1] "factor"

Tags:

R