Is there a dplyr or data.table equivalent to plyr::join_all? Joining by a list of data frames?
Combining @SimonOHanlon's data.table
method with @Jaap's Reduce
and merge
techniques appears to yield the most performant results:
library(data.table)
setDT(df)
count_x_dt <- function(dt) dt[, list(count_x = .N), keyby = x]
sum_y_dt <- function(dt) dt[, list(sum_y = sum(y)), keyby = x]
mean_y_dt <- function(dt) dt[, list(mean_y = mean(y)), keyby = x]
Reduce(function(...) merge(..., all = TRUE, by = c("x")),
list(count_x_dt(df), sum_y_dt(df), mean_y_dt(df)))
Updating to include a tidyverse
/ purrr
(purrr::reduce
) approach:
library(tidyverse)
list(count_x(df), sum_y(df), mean_y(df)) %>%
reduce(left_join)
In data.table
parlance this would be the equivalent of having a sorted keyed data.table and using the key to join the various data.tables.
e.g.
require(data.table)
setDT(df) #df is now a data.table
df_count <- df[ , list(count_x=.N),by=x]
df_sum <- df[ , list(sum_y = sum(y)),by=x]
# merge.data.table executes a fast join on the shared key
merge(df_count,df_sum)
# x count_x sum_y
#1: 1 2 129
#2: 2 2 128
#3: 3 2 154
#4: 4 2 182
#5: 5 2 151
In your example you might write something like this:
count_x <- function(dt) dt[ , list(N = .N) , keyby=x ]
sum_y <- function(dt) dt[ , list(Sum=sum(y)),keyby=x]
# Then merge...
merge(sum_y(df),count_x(df))
# x Sum N
#1: 1 129 2
#2: 2 128 2
#3: 3 154 2
#4: 4 182 2
#5: 5 151 2