Non-standard evaluation of subset argument with mapply in R
@AllanCameron alluded to the possibility of a purrr::map
solution. Here are a few options:
- Since we know we want to subset by the
breaks
column, we only need to provide the cutoff values and therefore don't have to worry about delaying evaluation of an expression. Here and in the other examples, we name the elements of the breaks list so that the output will also have names telling us whatbreaks
cutoff value was used. Also, in all examples we take advantage of thedplyr::filter
function to filter the data in thedata
argument, rather than thesubset
argument:
library(tidyverse)
map2(list(breaks.lt.15=15,
breaks.lt.20=20),
list(~ wool,
~ wool + tension),
~ xtabs(.y, data=filter(warpbreaks, breaks < .x))
)
#> $breaks.lt.15
#> wool
#> A B
#> 2 2
#>
#> $breaks.lt.20
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5
- Similar to above, but we supply the whole filter expression and wrap the filter expressions in
quos
to delay evaluation.!!.x
evaluates the expressions at the point where we filter thewarpbreaks
data frame insidextabs
.
map2(quos(breaks.lt.15=breaks < 15,
breaks.lt.20=breaks < 20),
list(~ wool,
~ wool + tension),
~ xtabs(.y, data=filter(warpbreaks, !!.x))
)
#> $breaks.lt.15
#> wool
#> A B
#> 2 2
#>
#> $breaks.lt.20
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5
- In case you want all combinations of filters and xtabs formulas, you can use the
crossing
function to generate the combinations and then pass that topmap
("parallel map"), which can take any number of arguments, all contained in a single list. In this case we userlang::exprs
instead ofquos
to delay evaluation.rlang::exprs
would also have worked above, butquos
doesn't work here. I'm not sure I really understand why, but it has to do with capturing both the expression and its environment (quos
) vs. capturing just the expression (exprs
), as discussed here.
# map over all four combinations of breaks and xtabs formulas
crossing(
rlang::exprs(breaks.lt.15=breaks < 15,
breaks.lt.20=breaks < 20),
list(~ wool,
~ wool + tension)
) %>%
pmap(~ xtabs(.y, data=filter(warpbreaks, !!.x)))
#> $breaks.lt.15
#> wool
#> A B
#> 2 2
#>
#> $breaks.lt.15
#> tension
#> wool L M H
#> A 0 1 1
#> B 1 0 1
#>
#> $breaks.lt.20
#> wool
#> A B
#> 7 9
#>
#> $breaks.lt.20
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5
You could also go with tidyverse functions for the summary instead of xtabs
and return a data frame. For example:
map2_df(c(15,20),
list("wool",
c("wool", "tension")),
~ warpbreaks %>%
filter(breaks < .x) %>%
group_by_at(.y) %>%
tally() %>%
bind_cols(max.breaks=.x - 1)
) %>%
mutate_if(is.factor, ~replace_na(fct_expand(., "All"), "All")) %>%
select(is.factor, everything()) # Using select this way requires development version of dplyr, soon to be released on CRAN as version 1.0.0
#> # A tibble: 7 x 4
#> wool tension n max.breaks
#> <fct> <fct> <int> <dbl>
#> 1 A All 2 14
#> 2 B All 2 14
#> 3 A M 4 19
#> 4 A H 3 19
#> 5 B L 2 19
#> 6 B M 2 19
#> 7 B H 5 19
If you wanted to include marginal counts, you could do:
crossing(
c(Inf,15,20),
list(NULL, "wool", "tension", c("wool", "tension"))
) %>%
pmap_df(
~ warpbreaks %>%
filter(breaks < .x) %>%
group_by_at(.y) %>%
tally() %>%
bind_cols(max.breaks=.x - 1)
) %>%
mutate_if(is.factor, ~replace_na(fct_expand(., "All"), "All")) %>%
select(is.factor, everything())
#> wool tension n max.breaks
#> 1 All All 4 14
#> 2 A All 2 14
#> 3 B All 2 14
#> 4 All L 1 14
#> 5 All M 1 14
#> 6 All H 2 14
#> 7 A M 1 14
#> 8 A H 1 14
#> 9 B L 1 14
#> 10 B H 1 14
#> 11 All All 16 19
#> 12 A All 7 19
#> 13 B All 9 19
#> 14 All L 2 19
#> 15 All M 6 19
#> 16 All H 8 19
#> 17 A M 4 19
#> 18 A H 3 19
#> 19 B L 2 19
#> 20 B M 2 19
#> 21 B H 5 19
#> 22 All All 54 Inf
#> 23 A All 27 Inf
#> 24 B All 27 Inf
#> 25 All L 18 Inf
#> 26 All M 18 Inf
#> 27 All H 18 Inf
#> 28 A L 9 Inf
#> 29 A M 9 Inf
#> 30 A H 9 Inf
#> 31 B L 9 Inf
#> 32 B M 9 Inf
#> 33 B H 9 Inf
And if we add a pivot_wider
onto the end of the previous chain, we can get:
pivot_wider(names_from=max.breaks, values_from=n,
names_prefix="breaks<=", values_fill=list(n=0))
wool tension `breaks<=14` `breaks<=19` `breaks<=Inf` 1 All All 4 16 54 2 A All 2 7 27 3 B All 2 9 27 4 All L 1 2 18 5 All M 1 6 18 6 All H 2 8 18 7 A M 1 4 9 8 A H 1 3 9 9 B L 1 2 9 10 B H 1 5 9 11 B M 0 2 9 12 A L 0 0 9
It's an issue of NSE. One way is to inject the subset conditions in the call directly so they can be applied in the relevant context (the data, where breaks
exist).
It can be done by using alist()
instead of list()
, to have a list of quoted expressions,
then build the correct call, (using bquote()
is the easiest way) and evaluate it.
mapply(
FUN = function(formula, data, subset)
eval(bquote(xtabs(formula, data, .(subset)))),
formula = list(~ wool,
~ wool + tension),
subset = alist(breaks < 15,
breaks < 20),
MoreArgs = list(data = warpbreaks))
#> [[1]]
#> wool
#> A B
#> 2 2
#>
#> [[2]]
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5
mapply(FUN = function(formula, data, FUN, subset)
eval(bquote(aggregate(formula, data, FUN, subset = .(subset)))),
formula = list(breaks ~ wool,
breaks ~ wool + tension),
subset = alist(breaks < 15,
breaks < 20),
MoreArgs = list(data = warpbreaks,
FUN = length))
#> [[1]]
#> wool breaks
#> 1 A 2
#> 2 B 2
#>
#> [[2]]
#> wool tension breaks
#> 1 B L 2
#> 2 A M 4
#> 3 B M 2
#> 4 A H 3
#> 5 B H 5
You don't really need MoreArgs
anymore as you can use the arguments directly in the call, so you might want to simplify it as follows :
mapply(
FUN = function(formula, subset)
eval(bquote(xtabs(formula, warpbreaks, subset = .(subset)))),
formula = list(~ wool,
~ wool + tension),
subset = alist(breaks < 15,
breaks < 20))
#> [[1]]
#> wool
#> A B
#> 2 2
#>
#> [[2]]
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5
mapply(FUN = function(formula, subset)
eval(bquote(aggregate(formula, warpbreaks, length, subset = .(subset)))),
formula = list(breaks ~ wool,
breaks ~ wool + tension),
subset = alist(breaks < 15,
breaks < 20))
#> [[1]]
#> wool breaks
#> 1 A 2
#> 2 B 2
#>
#> [[2]]
#> wool tension breaks
#> 1 B L 2
#> 2 A M 4
#> 3 B M 2
#> 4 A H 3
#> 5 B H 5
You could also avoid the call manipulation and the adhoc FUN
argument by building datasets to loop on using lapply :
mapply(
FUN = xtabs,
formula = list(~ wool,
~ wool + tension),
data = lapply(c(15, 20), function(x) subset(warpbreaks, breaks < x)))
#> [[1]]
#> wool
#> A B
#> 2 2
#>
#> [[2]]
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5
mapply(
FUN = aggregate,
formula = list(breaks ~ wool,
breaks ~ wool + tension),
data = lapply(c(15, 20), function(x) subset(warpbreaks, breaks < x)),
MoreArgs = list(FUN = length))
#> [[1]]
#> wool breaks
#> 1 A 2
#> 2 B 2
#>
#> [[2]]
#> wool tension breaks
#> 1 B L 2
#> 2 A M 4
#> 3 B M 2
#> 4 A H 3
#> 5 B H 5
The short answer is that when you create a list to pass as an argument to a function, it is evaluated at the point of creation. The error you are getting is because R tries to create the list you want to pass in the calling environment.
To see this more clearly, suppose you try creating the arguments you want to pass ahead of calling mapply
:
f_list <- list(~ wool, ~ wool + tension)
d_list <- list(data = warpbreaks)
mapply(FUN = xtabs, formula = f_list, MoreArgs = d_list)
#> [[1]]
#> wool
#> A B
#> 27 27
#>
#> [[2]]
#> tension
#> wool L M H
#> A 9 9 9
#> B 9 9 9
There is no problem with creating a list of formulas, because these are not evaluated until needed, and of course warpbreaks
is accessible from the global environment, hence this call to mapply
works.
Of course, if you try to create the following list ahead of the mapply
call:
subset_list <- list(breaks < 15, breaks < 20)
Then R will tell you that breaks
isn't found.
However, if you create the list with warpbreaks
in the search path, then you won't have a problem:
subset_list <- with(warpbreaks, list(breaks < 15, breaks < 20))
subset_list
#> [[1]]
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [14] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
#> [27] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [40] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
#> [53] FALSE FALSE
#>
#> [[2]]
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
#> [14] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
#> [27] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#> [40] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
#> [53] TRUE FALSE
so you would think that we could just pass this to mapply
and everything would be fine, but now we get a new error:
mapply(FUN = xtabs, formula = f_list, subset = subset_list, MoreArgs = d_list)
#> Error in eval(substitute(subset), data, env) : object 'dots' not found
So why are we getting this?
The problem lies in any functions passed to mapply
that call eval
, or that themselves call a function that uses eval
.
If you look at the source code for mapply
you will see that it takes the extra arguments you have passed and puts them in a list called dots
, which it will then pass to an internal mapply
call:
mapply
#> function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
#> {
#> FUN <- match.fun(FUN)
#> dots <- list(...)
#> answer <- .Internal(mapply(FUN, dots, MoreArgs))
#> ...
If your FUN
itself calls another function that calls eval
on any of its arguments, it will therefore try to eval
the object dots
, which won't exist in the environment in which the eval
is called. This is easy to see by doing an mapply
on a match.call
wrapper:
mapply(function(x) match.call(), x = list(1))
[[1]]
(function(x) match.call())(x = dots[[1L]][[1L]])
So a minimal reproducible example of our error is
mapply(function(x) eval(substitute(x)), x = list(1))
#> Error in eval(substitute(x)) : object 'dots' not found
So what's the solution? It seems like you have already hit on a perfectly good one, that is, manually subsetting the data frame you wish to pass. Others may suggest that you explore purrr::map
to get a more elegant solution.
However, it is possible to get mapply
to do what you want, and the secret is just to modify FUN
to turn it into an anonymous wrapper of xtabs
that subsets on the fly:
mapply(FUN = function(formula, subset, data) xtabs(formula, data[subset,]),
formula = list(~ wool, ~ wool + tension),
subset = with(warpbreaks, list(breaks < 15, breaks < 20)),
MoreArgs = list(data = warpbreaks))
#> [[1]]
#> wool
#> A B
#> 2 2
#>
#> [[2]]
#> tension
#> wool L M H
#> A 0 4 3
#> B 2 2 5