Referring to data.table columns by names saved in variables
If you are going to be doing complicated operations inside your j
expressions, you should probably use eval
and quote
. One problem with that in current version of data.table
is that the environment of eval
is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD
to eval
. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1]
in j
would).
Interestingly this issue only plagues the j
and you'll be fine using eval
normally in i
(where .SD
is not available anyway).
The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:
x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)
x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
Note how I was ok not adding .SD
in one eval(distcol)
, but won't be if I take it out of the other eval
.
Another option is to use get
:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]
Maybe you know about this solution already?
DT[[colname]]
This is inspired by @eddi's solution in the comments below, using the OP's example:
set.seed(1)
x = data.table(a = 1:10, b=rnorm(10))
colstr="b"
col <- eval(parse(text=paste("quote(",colstr,")",sep="")))
x[eval(col)<0]
x[eval(col)<0,c(colstr):=-100]
Say you have the column name in variable x
, you could do
colname = as.name(x)
you can then use colname
in the subset
function