data.table equivalent of tidyr::complete()

I reckon that the philosophy of data.table entails fewer specially-named functions for tasks than you'll find in the tidyverse, so some extra coding is required, like:

res = setDT(df)[
  CJ(person = person, observation_id = observation_id, unique=TRUE), 
  on=.(person, observation_id)
]

After this, you still have to manually handle the filling of values for missing levels. We can use setnafill to handle this efficiently & by-reference in recent versions of data.table:

setnafill(res, fill = 0, cols = 'value')

See @Jealie's answer regarding a feature that will sidestep this.


Certainly, it's crazy that the column names have to be entered three times here. But on the other hand, one can write a wrapper:

completeDT <- function(DT, cols, defs = NULL){
  mDT = do.call(CJ, c(DT[, ..cols], list(unique=TRUE)))
  res = DT[mDT, on=names(mDT)]
  if (length(defs)) 
    res[, names(defs) := Map(replace, .SD, lapply(.SD, is.na), defs), .SDcols=names(defs)]
  res[]
} 

completeDT(setDT(df), cols = c("person", "observation_id"), defs = c(value = 0))

   person observation_id value
1:      1              1     1
2:      1              2     0
3:      2              1     1
4:      2              2     1

As a quick way of avoiding typing the names three times for the first step, here's @thelatemail's idea:

vars <- c("person","observation_id")
df[do.call(CJ, c(mget(vars), unique=TRUE)), on=vars]

# or with magrittr...
c("person","observation_id") %>% df[do.call(CJ, c(mget(.), unique=TRUE)), on=.]

Update: now you don't need to enter names twice in CJ thanks to @MichaelChirico & @MattDowle for the improvement.


There might be a better answer out there, but this works:

dt[CJ(person=unique(dt$person), 
      observation_id=unique(dt$observation_id)),
   on=c('person','observation_id')]

Which gives:

   person observation_id value
1:      1              1     1
2:      2              1     1
3:      1              2    NA
4:      2              2     1

Now, if you would like to be able to fill with any value (and not NA), I would suggest to wait for the corresponding feature to be finished or contribute to it :)