Simultaneously merge multiple data.frames in a list
You can do it using merge_all
in the reshape
package. You can pass parameters to merge
using the ...
argument
reshape::merge_all(list_of_dataframes, ...)
Here is an excellent resource on different methods to merge data frames.
Another question asked specifically how to perform multiple left joins using dplyr in R . The question was marked as a duplicate of this one so I answer here, using the 3 sample data frames below:
x <- data.frame(i = c("a","b","c"), j = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, stringsAsFactors=FALSE)
Update June 2018: I divided the answer in three sections representing three different ways to perform the merge. You probably want to use the purrr
way if you are already using the tidyverse packages. For comparison purposes below, you'll find a base R version using the same sample dataset.
1) Join them with reduce
from the purrr
package:
The purrr
package provides a reduce
function which has a concise syntax:
library(tidyverse)
list(x, y, z) %>% reduce(left_join, by = "i")
# A tibble: 3 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
You can also perform other joins, such as a full_join
or inner_join
:
list(x, y, z) %>% reduce(full_join, by = "i")
# A tibble: 4 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
# 4 d NA 6 8
list(x, y, z) %>% reduce(inner_join, by = "i")
# A tibble: 1 x 4
# i j k l
# <chr> <int> <int> <int>
# 1 c 3 5 7
2) dplyr::left_join()
with base R Reduce()
:
list(x,y,z) %>%
Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="i"), .)
# i j k l
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
3) Base R merge()
with base R Reduce()
:
And for comparison purposes, here is a base R version of the left join based on Charles's answer.
Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
list(x,y,z))
# i j k l
# 1 a 1 NA 9
# 2 b 2 4 NA
# 3 c 3 5 7
Reduce makes this fairly easy:
merged.data.frame = Reduce(function(...) merge(..., all=T), list.of.data.frames)
Here's a fully example using some mock data:
set.seed(1)
list.of.data.frames = list(data.frame(x=1:10, a=1:10), data.frame(x=5:14, b=11:20), data.frame(x=sample(20, 10), y=runif(10)))
merged.data.frame = Reduce(function(...) merge(..., all=T), list.of.data.frames)
tail(merged.data.frame)
# x a b y
#12 12 NA 18 NA
#13 13 NA 19 NA
#14 14 NA 20 0.4976992
#15 15 NA NA 0.7176185
#16 16 NA NA 0.3841037
#17 19 NA NA 0.3800352
And here's an example using these data to replicate my.list
:
merged.data.frame = Reduce(function(...) merge(..., by=match.by, all=T), my.list)
merged.data.frame[, 1:12]
# matchname party st district chamber senate1993 name.x v2.x v3.x v4.x senate1994 name.y
#1 ALGIERE 200 RI 026 S NA <NA> NA NA NA NA <NA>
#2 ALVES 100 RI 019 S NA <NA> NA NA NA NA <NA>
#3 BADEAU 100 RI 032 S NA <NA> NA NA NA NA <NA>
Note: It looks like this is arguably a bug in merge
. The problem is there is no check that adding the suffixes (to handle overlapping non-matching names) actually makes them unique. At a certain point it uses [.data.frame
which does make.unique
the names, causing the rbind
to fail.
# first merge will end up with 'name.x' & 'name.y'
merge(my.list[[1]], my.list[[2]], by=match.by, all=T)
# [1] matchname party st district chamber senate1993 name.x
# [8] votes.year.x senate1994 name.y votes.year.y
#<0 rows> (or 0-length row.names)
# as there is no clash, we retain 'name.x' & 'name.y' and get 'name' again
merge(merge(my.list[[1]], my.list[[2]], by=match.by, all=T), my.list[[3]], by=match.by, all=T)
# [1] matchname party st district chamber senate1993 name.x
# [8] votes.year.x senate1994 name.y votes.year.y senate1995 name votes.year
#<0 rows> (or 0-length row.names)
# the next merge will fail as 'name' will get renamed to a pre-existing field.
Easiest way to fix is to not leave the field renaming for duplicates fields (of which there are many here) up to merge
. Eg:
my.list2 = Map(function(x, i) setNames(x, ifelse(names(x) %in% match.by,
names(x), sprintf('%s.%d', names(x), i))), my.list, seq_along(my.list))
The merge
/Reduce
will then work fine.