Apply multiple functions to multiple columns in data.table

I'd normally do this:

my.summary = function(x) list(mean = mean(x), median = median(x))

DT[, unlist(lapply(.SD, my.summary)), .SDcols = c('a', 'b')]
#a.mean a.median   b.mean b.median 
#     3        3        4        4 

This is a little bit clumsy but does the job with data.table:

funcs = c('median', 'mean', 'sum')

m = DT[, lapply(.SD, function(u){
        sapply(funcs, function(f) do.call(f,list(u)))
     })][, t(.SD)]
colnames(m) = funcs

#  median mean sum
#a      3    3  15
#b      4    4  20
#c      5    5  25

Other answers show how to do it, but no one bothered to explain the basic principle. The basic rule is that elements of lists returned by j expressions form the columns of the resulting data.table. Any j expression that produces a list, each element of which corresponds to a desired column in the result, will work. With this in mind we can use

DT[, c(mean = lapply(.SD, mean),
       median = lapply(.SD, median)),
  .SDcols = c('a', 'b')]
##    mean.a mean.b median.a median.b
## 1:      3      4        3        4

or

DT[, unlist(lapply(.SD,
                   function(x) list(mean = mean(x),
                                    median = median(x))),
            recursive = FALSE),
   .SDcols = c('a', 'b')]
##    a.mean a.median b.mean b.median
## 1:      3        3      4        4

depending on the desired order.

Importantly we can use any method we want to produce the desired result, provided only that we arrange the result into a list as described above. For example,

library(matrixStats)
DT[, c(mean = as.list(colMeans(.SD)),
       median = setNames(as.list(colMedians(as.matrix(.SD))), names(.SD))),
   .SDcols = c('a', 'b')]
##    mean.a mean.b median.a median.b
## 1:      3      4        3        4

also works.

Tags:

R

Data.Table