data.table operations by column name
This is similar to :
How to generate a linear combination of variables and update table using data.table in a loop call?
but you want to combine this with by=
too, so set()
isn't flexible enough. That's a deliberate design design and set()
is unlikely to change in that regard.
I sometimes use the EVAL
helper at the end of that answer.
https://stackoverflow.com/a/20808573/403310
Some wince at that approach but I just think of it like constructing a dynamic SQL statement, which is quite common practice. The EVAL
approach gives ultimate flexibility without head scratching about eval()
and quote()
. To see the dynamic query that's been constructed (to check it) you can add a print
inside your EVAL
helper function.
However, in this simple example you can wrap the LHS of :=
with brackets to tell data.table
to lookup the value (clearer than with=FALSE
), and the RHS needs a get()
.
for (n in c("a","b")) {
s <- paste0("s",n)
a[, (s) := sum(get(n)), by="id"]
}
Edit 2020-02-15 about ..
data.table
also supports the ..
syntax to "look up a level", obviating the need for with=FALSE
in most cases, e.g. dt[ , ..n1]
and dt[ , ..n2]
in the below
have a look at with
in ? data.table
:
dt <- data.table(id=1:5,a=21:25,b=11:15,key="id")
dt[, n3 := dt[ , n1, with = FALSE ] * dt[ , n2, with = FALSE ], with = FALSE ]
EDIT:
Or you just change the colnames forth and back:
dt <- data.table(id=1:5,a=21:25,b=11:15,key="id")
dt[ , dt.names["n3"] := 1L, with = FALSE ]
dt.names <- c( n1 = "a", n2 = "b", n3 = "c" )
setnames( dt, dt.names, names(dt.names) )
dt[ , n3 := n1 * n2, by = "id" ]
setnames( dt, names(dt.names), dt.names )
which works together with by.
You can also do this:
a <- data.table(id=c(1,1,2,2,3),a=21:25,b=11:15,key="id")
a[, c("sa", "sb") := lapply(.SD, sum), by = id]
Or slightly more generally:
cols.to.sum = c("a", "b")
a[, paste0("s", cols.to.sum) := lapply(.SD, sum), by = id, .SDcols = cols.to.sum]