Translating dplyr to data.table
Might I recommend the rowid function? It does the grouping step "under the hood" you might find it looks cleaner:
unique(DT, by='mpg')[order(am, mpg), row_num := LETTERS[rowid(am)]]
if you love chaining, you could also get everything inside []
:
DT[ , .SD[1L], by = mpg
][order(am, mpg), row_num := LETTERS[rowid(am)]]
I'm experimenting with some tweaks to the translation so that dtplyr will automatically produce something more like what you want:
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
dt <- lazy_dt(mtcars)
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[seq_len(.N)]),
#> keyby = .(am)]
Or avoiding the grouping as @MichaelChirico suggests:
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
arrange(am, mpg) %>%
mutate(row_num = LETTERS[row_number(am)]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[frank(am,
#> ties.method = "first", na.last = "keep")])]
(Using the ..
in front of LETTERS
is a data.table feature that makes it clear that you're referring to a variable outside of the data frame; it's probably not necessary here but I think it's better to be safe than sorry.)
We can use seq_len(.N)
unique(DT, by = "mpg")[order(am, mpg)][,
`:=`(row_num = LETTERS[seq_len(.N)]), by = .(am)][]