dplyr::n() returns "Error: This function should not be called directly"
So, I do not really have a problem, I can just avoid [writing
dplyr::n()
], but I'm curious about why it even happens.
Here's the source code for dplyr::n
in dplyr 0.5.0:
function () {
stop("This function should not be called directly")
}
That's why the fully qualified form raises this error: the function always returns an error. (My guess is that the error-throwing function dplyr::n
exists so that n()
could have a typical documentation page with examples.)
Inside of filter
/mutate
/summarise
statements, n()
is not calling this function. Instead, some internal function calculates the group sizes for the expression n()
. That's why the following works when dplyr is not loaded:
n()
#> Error: could not find function "n"
library(magrittr)
iris %>%
dplyr::group_by(Species) %>%
dplyr::summarise(n = n())
#> # A tibble: 3 × 2
#> Species n
#> <fctr> <int>
#> 1 setosa 50
#> 2 versicolor 50
#> 3 virginica 50
Here n()
cannot be mapped to a function, so we get an error. But when used it inside of a dplyr verb, n()
does map to something and returns group sizes.
I think this is coming as a result of masking between plyr and dplyr. Anyhow this solves it:
dplyr::summarise(count = n())
I know I am 2 years late, but here’s my take.
The grouping in dplyr doesn’t actually do anything to the data. It just notes it’s grouped. This means the functions like mean or n need to be aware of this, and must infer from their wider context they should perform their calculations groupwise. They aren’t reallu R functions, which aren’t aware of this context. They are basically symbols that summarise() or mutate() choose to evaluate in a certain way (means or counts per group). I think Hadley chose to show an error if you call n() directly, as that’s slightly better than not having a function implemented at all.