Find consecutive sequence of zeros in R

isMidPoint below will identify the middle 0 if there is one.

library(data.table)
myOriginalDf <- data.table(myOriginalDf, key="id")

myOriginalDf[, isMidPoint := FALSE]
myOriginalDf <- myOriginalDf[!is.na(value)][(c(FALSE, !value[-(1:2)], FALSE) & c(!value[-(length(value))], FALSE) & c(FALSE, !value[-length(value)])), isMidPoint := TRUE, by=id]

Explanation:

To find a series of three in a row, you simply need to compare each element from the 2nd to the 2nd-to-last with its neighbor before it and after it.

Since your values are 0 / 1, they are effectively T / F, and this makes it extremely simple to evaluate (assuming there were no NAs).

If v are your values (without NAs), then !v & !v[-1] will be TRUE anywhere where an element and its successor are 0. Add in & !v[-(1:2)] and this will be true wherever you have the middle of a series of three 0s. Notice that this also catches a series of 4+ 0s as well!

Then all that remains is to (1) calculate the above while removing (and accounting for!) any NAs, and (2) sepearate by id value. Fortunately, data.table makes of these a breeze.

Results:

  > myOriginalDf

    row value id isMidPoint
 1:   1     1  x      FALSE
 2:   2     1  x      FALSE
 3:   3     0  x      FALSE
 4:   4     0  x      FALSE
 5:   5     1  x      FALSE
 6:   6     0  x      FALSE
 7:   7     0  x       TRUE  <~~~~
 8:   9     0  x      FALSE
 9:  10     1  x      FALSE
10:  11     0  x      FALSE
11:  12     0  x       TRUE  <~~~~
12:  13     0  x       TRUE  <~~~~
13:  14     0  x       TRUE  <~~~~
14:  15     0  x      FALSE
15:  16     1  y      FALSE
16:  17     0  y      FALSE
17:  18     0  y       TRUE  <~~~~
18:  20     0  y      FALSE
19:  21     1  y      FALSE
20:  22     1  y      FALSE
21:  23     0  y      FALSE
22:  25     0  y       TRUE  <~~~~
23:  27     0  y       TRUE  <~~~~
24:  29     0  y      FALSE
    row value id isMidPoint

EDIT AS PER COMMENTS:

If you want to find the last sequence that is true use:

    max(which(myOriginalDf$isMidpoint))

If you want to know if the last sequence that is true use:

  # Will be TRUE if last possible sequence is 0-0-0
  #   Note, this accounts for NA's as well
  myOriginalDf[!is.na(value), isMidpoint[length(isMidpoint)-1]

Using data.table, as your question suggests you actually want to, as far I a can see, this is doing what you want

DT <- data.table(myOriginalDf)

# add the original order, so you can't lose it
DT[, orig := .I]

# rle by id, saving the length as a new variables

DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']

# key by value and length to subset 

setkey(DT, value, rleLength)

# which rows are value = 0 and length > 2

DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]

##    value rleLength id orig
## 1:     0         3  x    6
## 2:     0         3  x    7
## 3:     0         3  x    8
## 4:     0         4  y   10
## 5:     0         4  y   11
## 6:     0         4  y   12
## 7:     0         4  y   13

Here is an apply statement based on your solution for a vector. It might do what you want.

z <- apply(mydf,1, function(x) {
runs <-  rle(x[is.na(x)==FALSE]) ;
runs$lengths[length(runs$lengths)] > 2 & runs$values[length(runs$lengths)]==0 })

mydf[z,]

#   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# x  1  1  0  0  1  0  0 NA NA   0

Tags:

R

Data.Table