dplyr: lead() and lag() wrong when used with group_by()
It may happen that stats::lag
is used instead (e.g. when restoring environments with the session
package). This can easly slip through unnoticed as it it won't throw an error when being used as in the question. Double-check by simply typing lag
, use conflicted
package, or disambiguate the function call by calling dplyr::lag
instead.
The same could happen for plyr::mutate
, in case you might have loaded plyr
package in your session. So make sure you're also calling dplyr::mutate
.
It seems you have to pass additional argument to lag and lead functions. When I run your function without arrange, but with order_by added, everything seems to be ok.
df %>%
group_by(name) %>%
mutate(next.score = lead(score, order_by=name),
before.score = lag(score, order_by=name))
Output:
name score next.score before.score
1 Al 100 60 NA
2 Jen 80 100 NA
3 Al 60 80 100
4 Jen 100 60 80
5 Al 80 NA 60
6 Jen 60 NA 100
My sessionInfo():
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.1
loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1 lazyeval_0.1.10 magrittr_1.5 parallel_3.1.1 Rcpp_0.11.5
[7] tools_3.1.1
Using order_by is good when you have only one grouping variable. In case of multiple grouping variable, I could not find any solution except, writing and reading the table to get rid of grouping variables. It worked pretty well for me, but its efficiency depends on the size of table.