R how to identify distance of last occurrence

I would suggest creating a grouping column based on when there is a switch from FALSE to TRUE:

# create group column
d[c(light), group := cumsum(light)]
d[is.na(group), group:=0L]
d[, group := cumsum(group)]
d

Then simply tally by group, using cumsum and negating light:

d[, distance := cumsum(!light), by=group]

# remove the group column for cleanliness
d[, group := NULL]

Results:

d

         date light distance
1: 2013-06-01  TRUE        0
2: 2013-06-02 FALSE        1
3: 2013-06-03 FALSE        2
4: 2013-06-04  TRUE        0
5: 2013-06-05  TRUE        0
6: 2013-06-06 FALSE        1
7: 2013-06-07 FALSE        2
8: 2013-06-08  TRUE        0

I added a few rows


This should do it:

d[, distance := 1:.N - 1, by = cumsum(light)]

or this:

d[, distance := .I - .I[1], by = cumsum(light)]

And if you want to actually count number of days as opposed to row-distance, you could use:

d[, distance := as.numeric(as.POSIXct(date, format = "%m/%d/%Y") -
                           as.POSIXct(date[1], format = "%m/%d/%Y"),
                           units = 'days'),
    by = cumsum(light)]

An approach using run length encoding (rle) and sequence (which is a wrapper for unlist(lapply(nvec, seq_len))

d[, distance := sequence(rle(light)$lengths)][(light), distance := 0]

Tags:

R

Data.Table