Referring to data.table columns by names saved in variables

If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).

Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).

The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:

x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)

x[eval(valcol) < 5,
  capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]

Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.

Another option is to use get:

diststr = "dist"
valstr = "val"

x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]

Maybe you know about this solution already?

DT[[colname]]

This is inspired by @eddi's solution in the comments below, using the OP's example:

set.seed(1)
x = data.table(a = 1:10, b=rnorm(10))
colstr="b"
col <- eval(parse(text=paste("quote(",colstr,")",sep="")))
x[eval(col)<0]
x[eval(col)<0,c(colstr):=-100]

Say you have the column name in variable x, you could do

colname = as.name(x)

you can then use colname in the subset function

Tags:

R

Data.Table