Find consecutive sequence of zeros in R
isMidPoint
below will identify the middle 0
if there is one.
library(data.table)
myOriginalDf <- data.table(myOriginalDf, key="id")
myOriginalDf[, isMidPoint := FALSE]
myOriginalDf <- myOriginalDf[!is.na(value)][(c(FALSE, !value[-(1:2)], FALSE) & c(!value[-(length(value))], FALSE) & c(FALSE, !value[-length(value)])), isMidPoint := TRUE, by=id]
Explanation:
To find a series of three in a row, you simply need to compare each element from the 2nd to the 2nd-to-last with its neighbor before it and after it.
Since your values are 0 / 1
, they are effectively T / F
, and this
makes it extremely simple to evaluate (assuming there were no NAs).
If v
are your values (without NAs), then !v & !v[-1]
will be TRUE anywhere
where an element and its successor are 0. Add in & !v[-(1:2)]
and this will
be true wherever you have the middle of a series of three 0s
.
Notice that this also catches a series of 4+ 0s
as well!
Then all that remains is to (1) calculate the above while removing (and accounting for!) any NAs, and (2) sepearate by id value. Fortunately, data.table
makes of these a breeze.
Results:
> myOriginalDf
row value id isMidPoint
1: 1 1 x FALSE
2: 2 1 x FALSE
3: 3 0 x FALSE
4: 4 0 x FALSE
5: 5 1 x FALSE
6: 6 0 x FALSE
7: 7 0 x TRUE <~~~~
8: 9 0 x FALSE
9: 10 1 x FALSE
10: 11 0 x FALSE
11: 12 0 x TRUE <~~~~
12: 13 0 x TRUE <~~~~
13: 14 0 x TRUE <~~~~
14: 15 0 x FALSE
15: 16 1 y FALSE
16: 17 0 y FALSE
17: 18 0 y TRUE <~~~~
18: 20 0 y FALSE
19: 21 1 y FALSE
20: 22 1 y FALSE
21: 23 0 y FALSE
22: 25 0 y TRUE <~~~~
23: 27 0 y TRUE <~~~~
24: 29 0 y FALSE
row value id isMidPoint
EDIT AS PER COMMENTS:
If you want to find the last sequence that is true use:
max(which(myOriginalDf$isMidpoint))
If you want to know if the last sequence that is true use:
# Will be TRUE if last possible sequence is 0-0-0
# Note, this accounts for NA's as well
myOriginalDf[!is.na(value), isMidpoint[length(isMidpoint)-1]
Using data.table
, as your question suggests you actually want to, as far I a can see, this is doing what you want
DT <- data.table(myOriginalDf)
# add the original order, so you can't lose it
DT[, orig := .I]
# rle by id, saving the length as a new variables
DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']
# key by value and length to subset
setkey(DT, value, rleLength)
# which rows are value = 0 and length > 2
DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]
## value rleLength id orig
## 1: 0 3 x 6
## 2: 0 3 x 7
## 3: 0 3 x 8
## 4: 0 4 y 10
## 5: 0 4 y 11
## 6: 0 4 y 12
## 7: 0 4 y 13
Here is an apply statement based on your solution for a vector. It might do what you want.
z <- apply(mydf,1, function(x) {
runs <- rle(x[is.na(x)==FALSE]) ;
runs$lengths[length(runs$lengths)] > 2 & runs$values[length(runs$lengths)]==0 })
mydf[z,]
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# x 1 1 0 0 1 0 0 NA NA 0