Creating a function with an argument passed to dplyr::filter what is the best way to work around nse?
The answer from @eddi is correct about what's going on here.
I'm writing another answer that addresses the larger request of how to write functions using dplyr
verbs. You'll note that, ultimately, it uses something like nrowspecies2
to avoid the species == species
tautology.
To write a function wrapping dplyr verb(s) that will work with NSE, write two functions:
First write a version that requires quoted inputs, using lazyeval
and
an SE version of the dplyr
verb. So in this case, filter_
.
nrowspecies_robust_ <- function(data, species){
species_ <- lazyeval::as.lazy(species)
condition <- ~ species == species_ # *
tmp <- dplyr::filter_(data, condition) # **
nrow(tmp)
}
nrowspecies_robust_(iris, ~versicolor)
Second make a version that uses NSE:
nrowspecies_robust <- function(data, species) {
species <- lazyeval::lazy(species)
nrowspecies_robust_(data, species)
}
nrowspecies_robust(iris, versicolor)
* = if you want to do something more complex, you may need to use lazyeval::interp
here as in the tips linked below
** = also, if you need to change output names, see the .dots
argument
For the above, I followed some tips from Hadley
Another good resource is the dplyr vignette on NSE, which illustrates
.dots
,interp
, and other functions from thelazyeval
packageFor even more details on lazyeval see it's vignette
For a thorough discussion of the base R tools for working with NSE (many of which
lazyeval
helps you avoid), see the chapter on NSE in Advanced R
This question has absolutely nothing to do with non standard evaluation. Let me rewrite your initial function to make that clear:
nrowspecies4 <- function(dtf, boo){
dtf %>%
filter(boo == boo) %>%
nrow()
}
nrowspecies4(iris, boo = "versicolor")
#150
The expression inside your filter
always evaluates to TRUE
(almost always - see example below), that's why it doesn't work, not because of some NSE magic.
Your nrowspecies2
is the way to go.
Fwiw, species
in your nrowspecies0
is indeed evaluated as a column, not as the input variable species
, and you can check that by comparing nrowspecies0(iris, NA)
to nrowspecies4(iris, NA)
.