R - How to run average & max on different data.table columns based on multiple factors & return original colnames
In a similar fashion as @David Arenburg, but using .SDcols
in order to simplify the notation. Also I show the code until the merge.
DTaves <- DT[, lapply(.SD, mean), .SDcols = VarToAve, by = c(GrpVar1, GrpVar2)]
DTmaxs <- DT[, lapply(.SD, max), .SDcols = VarToMax, by = c(GrpVar1, GrpVar2)]
merge(DTmaxs, DTaves)
## a e b c d
## 1: A Y 6 4 0.2230091
## 2: B Z 7 6 0.5909434
## 3: C Z 8 7 -0.4828223
## 4: D Z 9 8 -1.3591240
Alternatively, you can do this in one go by subsetting the .SD
using the ..
notation to look for VarToAve
in the parent frame of .SD
(as opposed to a column named VarToAve
)
DT[, c(lapply(.SD[, ..VarToAve], mean),
lapply(.SD[, ..VarToMax], max)),
by = c(GrpVar1, GrpVar2)]
## a e c d b
## 1: A Y 4 0.2230091 6
## 2: B Z 6 0.5909434 7
## 3: C Z 7 -0.4828223 8
## 4: D Z 8 -1.3591240 9
Here's my humble attempt
DT[, as.list(c(setNames(max(get(VarToMax)), VarToMax),
lapply(.SD[, ..VarToAve], mean))),
c(GrpVar1, GrpVar2)]
# a e b c d
# 1: A Y 6 4 -0.8000173
# 2: B Z 7 6 0.2508633
# 3: C Z 8 7 1.1966517
# 4: D Z 9 8 1.7291615
Or, for maximum efficiency you could use colMeans
and eval(as.name())
combination instead of lapply
and get
DT[, as.list(c(setNames(max(eval(as.name(VarToMax))), VarToMax),
colMeans(.SD[, ..VarToAve]))),
c(GrpVar1, GrpVar2)]
# a e b c d
# 1: A Y 6 4 -0.8000173
# 2: B Z 7 6 0.2508633
# 3: C Z 8 7 1.1966517
# 4: D Z 9 8 1.7291615