How to create mean and s.d. columns in data.table
.SD
is itself a data.table
Thus, when you take mean(.SD)
you are (attempting) to take the mean of an entire data.table
The function mean()
does not know what to do with the data.table and returns NA
Have a look
## the .SD in your question is the same as
test[, c('A','B','C','D')]
## try taking its mean
mean(test[, c('A','B','C','D')])
# Warning in mean.default(test[, c("A", "B", "C", "D")]) :
# argument is not numeric or logical: returning NA
# [1] NA
try this instead
use lapply(.SD, mean)
for column-wise
or apply(.SD, 1, mean)
for row-wise
You can make mean
work by using rowMeans
instead, and thus avoid using apply
(similar to the linked question)
test[,`:=`(mean_test = rowMeans(.SD),
sd_test = sd(.SD)),
by=id,.SDcols=c('A','B','C','D')]
test
# id A B C D mean_test sd_test
# 1: 1 2.00 3.0 4.00 5 3.500 1.2909944
# 2: 2 3.75 4.5 5.25 6 4.875 0.9682458
# 3: 3 5.50 6.0 6.50 7 6.250 0.6454972
# 4: 4 7.25 7.5 7.75 8 7.625 0.3227486
# 5: 5 9.00 9.0 9.00 9 9.000 0.0000000
Rather as a fun fact, one can use a vector of columns in mean()
and sd()
:
test[, `:=` (mean = mean(c(A,B,C,D)),
sd = sd(c(A,B,C,D))), by=id]
test
# id A B C D mean sd
# 1: 1 2.00 3.0 4.00 5 3.500 1.2909944
# 2: 2 3.75 4.5 5.25 6 4.875 0.9682458
# 3: 3 5.50 6.0 6.50 7 6.250 0.6454972
# 4: 4 7.25 7.5 7.75 8 7.625 0.3227486
# 5: 5 9.00 9.0 9.00 9 9.000 0.0000000
And you can also use quote()
and eval()
:
cols <- quote(c(A,B,C,D))
test[, ':=' (mean = mean(eval(cols)),
sd = sd(eval(cols))), by=id]