How to efficiently find last preceding row with nonzero value in R data.table
You might try using nafill
:
# create a dummy column that is only populated for nonzero z (and hence NA elsewhere)
DT[z != 0, y_copy := y]
# nafill on this column using LOCF strategy by group:
DT[ , ans := nafill(y_copy, type = 'locf'), by = x][]
# x y z y_copy ans
# 1: b 1 0 NA NA
# 2: b 2 1 2 2
# 3: b 3 0 NA 2
# 4: b 4 1 4 4
# 5: b 5 1 5 5
# 6: b 6 0 NA 5
# 7: a 1 0 NA NA
# 8: a 2 1 2 2
# 9: a 3 0 NA 2
# 10: a 4 0 NA 2
# 11: a 5 1 5 5
# 12: a 6 0 NA 5
# 13: c 1 0 NA NA
# 14: c 2 0 NA NA
# 15: c 3 0 NA NA
# 16: c 4 1 4 4
# 17: c 5 0 NA 4
# 18: c 6 0 NA 4
For now, nafill
is a development only feature (data.table
1.12.3+) but 1.12.4 should be on CRAN in the next week or two. For the moment, you can install this with install.packages('data.table', type = 'source', repos = 'http://Rdatatable.github.io/data.table')
If you don't want to create y_copy
, you could do this inline with is.na<-
:
DT[ , ans := nafill(`is.na<-`(y, z == 0), type = 'locf'), by = x]
This will be inefficient because z==0
is calculated repeatedly by group (instead of as a single vector); you could do this in the first step then:
DT[ , z_zero := z == 0]
But this means another dummy column (with less storage than y_copy
if y
is numeric
, character
, or complex
)
An option with non-equi join
library(data.table)
library(zoo)
DT[DT[z!=0, .(y1 = y, x)], output := y1, on = .(x, y <= y1),
mult = 'last'][, output := na.locf0(output), x]
DT
# x y z output
# 1: b 1 0 NA
# 2: b 2 1 2
# 3: b 3 0 2
# 4: b 4 1 4
# 5: b 5 1 5
# 6: b 6 0 5
# 7: a 1 0 NA
# 8: a 2 1 2
# 9: a 3 0 2
#10: a 4 0 2
#11: a 5 1 5
#12: a 6 0 5
#13: c 1 0 NA
#14: c 2 0 NA
#15: c 3 0 NA
#16: c 4 1 4
#17: c 5 0 4
#18: c 6 0 4
Another option using rolling join:
DT[, output:= DT[z==1][.SD, on=.(x, y), roll=Inf, x.y]]
output:
x y z output
1: b 1 0 NA
2: b 2 1 2
3: b 3 0 2
4: b 4 1 4
5: b 5 1 5
6: b 6 0 5
7: a 1 0 NA
8: a 2 1 2
9: a 3 0 2
10: a 4 0 2
11: a 5 1 5
12: a 6 0 5
13: c 1 0 NA
14: c 2 0 NA
15: c 3 0 NA
16: c 4 1 4
17: c 5 0 4
18: c 6 0 4