Why does `substitute` work in multiple lines, but not in a single line?
In substitute
's documentation you can read how it decides what to substitute,
and the fact that, by default, it searches the environment where it is called.
If you call substitute
inside the data.table
frame
(i.e. inside []
)
it won't be able to find the symbols because they are not present inside the data.table
evaluation environment,
they are in the environment where [
was called.
You can "invert" the order in which the functions are called in order to get the behavior you want:
library(data.table)
foo <- function(dt, group, var) {
eval(substitute(dt[, sum(var), by = group]))
}
foo(as.data.table(mtcars), cyl, mpg)
cyl V1
1: 6 138.2
2: 4 293.3
3: 8 211.4
It seems that substitute
does not work within data table in the way one might expect from how it works in other contexts but you can use enexpr
from the rlang package in place of substitute
:
library(data.table)
library(rlang)
gregor_rlang = function(data, var, group) {
data[, sum(eval(enexpr(var))), by = .(group = eval(enexpr(group)))]
}
gregor_rlang(mt, mpg, cyl)
## group V1
## 1: 6 138.2
## 2: 4 293.3
## 3: 8 211.4
environments
The problem seems to be related to environments as this works where we have specifically given the environment substitute
should use.
gregor_pf = function(data, val, group) {
data[, sum(eval(substitute(val, parent.env(environment())))),
by = c(deparse(substitute(group)))]
}
gregor_pf(mt, mpg, cyl)
## cyl V1
## 1: 6 138.2
## 2: 4 293.3
## 3: 8 211.4
data.table uses NSE because it needs to analyse/manipulate the by
argument before choosing if it will evaluate it or not (if you give it a symbol for example it won't evaluate it).
A consequence is that if the argument needs to be evaluated it should be evaluated in the right environment and this is the function's responsibility. data.table
evaluates its by
argument in the data, not in the calling environment.
In most cases you don't see the issue as the symbol will be evaluated in the parent environment if not found, but substitute()
is more sensitive.
See example below :
fun <- function(x){
standard_eval(x)
non_standard_eval_safe(x)
non_standard_eval_not_safe(x)
}
standard_eval <- function(expr) print(expr)
non_standard_eval_safe <- function(expr) {
expr <- bquote(print(.(substitute(expr)))) # will be quote(print(x)) in our example
eval.parent(expr)
}
non_standard_eval_not_safe <- function(expr) {
expr <- bquote(print(.(substitute(expr)))) # will be quote(print(x)) in our example
eval(expr)
}
standard_eval(1+1)
#> [1] 2
non_standard_eval_safe(1+1)
#> [1] 2
non_standard_eval_not_safe(1+1)
#> [1] 2
fun(1+1)
#> [1] 2
#> [1] 2
#> Error in print(x): object 'x' not found
Created on 2020-02-20 by the reprex package (v0.3.0)