Apply multiple functions to multiple columns in data.table
I'd normally do this:
my.summary = function(x) list(mean = mean(x), median = median(x))
DT[, unlist(lapply(.SD, my.summary)), .SDcols = c('a', 'b')]
#a.mean a.median b.mean b.median
# 3 3 4 4
This is a little bit clumsy but does the job with data.table
:
funcs = c('median', 'mean', 'sum')
m = DT[, lapply(.SD, function(u){
sapply(funcs, function(f) do.call(f,list(u)))
})][, t(.SD)]
colnames(m) = funcs
# median mean sum
#a 3 3 15
#b 4 4 20
#c 5 5 25
Other answers show how to do it, but no one bothered to explain the basic principle. The basic rule is that elements of lists returned by j
expressions form the columns of the resulting data.table
. Any j
expression that produces a list, each element of which corresponds to a desired column in the result, will work. With this in mind we can use
DT[, c(mean = lapply(.SD, mean),
median = lapply(.SD, median)),
.SDcols = c('a', 'b')]
## mean.a mean.b median.a median.b
## 1: 3 4 3 4
or
DT[, unlist(lapply(.SD,
function(x) list(mean = mean(x),
median = median(x))),
recursive = FALSE),
.SDcols = c('a', 'b')]
## a.mean a.median b.mean b.median
## 1: 3 3 4 4
depending on the desired order.
Importantly we can use any method we want to produce the desired result, provided only that we arrange the result into a list as described above. For example,
library(matrixStats)
DT[, c(mean = as.list(colMeans(.SD)),
median = setNames(as.list(colMedians(as.matrix(.SD))), names(.SD))),
.SDcols = c('a', 'b')]
## mean.a mean.b median.a median.b
## 1: 3 4 3 4
also works.