Is there a dplyr or data.table equivalent to plyr::join_all? Joining by a list of data frames?

Combining @SimonOHanlon's data.table method with @Jaap's Reduce and merge techniques appears to yield the most performant results:

library(data.table)
setDT(df)
count_x_dt <- function(dt) dt[, list(count_x = .N), keyby = x]
sum_y_dt   <- function(dt) dt[, list(sum_y = sum(y)), keyby = x]
mean_y_dt  <- function(dt) dt[, list(mean_y = mean(y)), keyby = x]

Reduce(function(...) merge(..., all = TRUE, by = c("x")), 
       list(count_x_dt(df), sum_y_dt(df), mean_y_dt(df)))

Updating to include a tidyverse / purrr (purrr::reduce) approach:

library(tidyverse)
list(count_x(df), sum_y(df), mean_y(df)) %>% 
  reduce(left_join)

In data.table parlance this would be the equivalent of having a sorted keyed data.table and using the key to join the various data.tables.

e.g.

require(data.table)
setDT(df)  #df is now a data.table
df_count <- df[ , list(count_x=.N),by=x]
df_sum <- df[ , list(sum_y = sum(y)),by=x]
#  merge.data.table executes a fast join on the shared key
merge(df_count,df_sum)
#   x count_x sum_y
#1: 1       2   129
#2: 2       2   128
#3: 3       2   154
#4: 4       2   182
#5: 5       2   151

In your example you might write something like this:

count_x <- function(dt) dt[ , list(N = .N) , keyby=x ]
sum_y <- function(dt) dt[ , list(Sum=sum(y)),keyby=x]

#  Then merge...
merge(sum_y(df),count_x(df))
#   x Sum N
#1: 1 129 2
#2: 2 128 2
#3: 3 154 2
#4: 4 182 2
#5: 5 151 2

Is there a dplyr or data.table equivalent to plyr::join_all? Joining by a list of data frames?

Tags:

R

Plyr

Data.Table

Dplyr

Related

Recent Posts