Left Join in R (dplyr) - Too many observations?
It's hard to know without seeing your original data, but if data frame B does not contain unique values on the join columns, you will get repeated rows from data frame A whenever this happens. You could try:
data_frame_b %>% count(join_col_1, join_col_2)
Which will let you know if there are non-unique combinations of the two variables.
With left_join(A, B)
new rows will be added wherever there are multiple rows in B
for which the key columns (same-name columns by default) match the same, single row in A
. For example:
library(dplyr)
df1 <- data.frame(col1 = LETTERS[1:4],
col2 = 1:4)
df2 <- data.frame(col1 = rep(LETTERS[1:2], 2),
col3 = 4:1)
left_join(df1, df2) # has 6 rows rather than 4
More rows may also appear if you have NA
values in both A
's and B
's names on which you join. So make sure you exclude those.