Identify and count spells (Distinctive events within each group)

Here's a helper function that can return what you are after

spell_index <- function(time, flag) {
  change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
  cumsum(change) * (flag==1)+0
}

And you can use it with your data like

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = spell_index(time, is.5)
  )

Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.

One option using rle

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = {
      r <- rle(is.5)
      r$values <- cumsum(r$values) * r$values
      inverse.rle(r) 
      }
  )
# A tibble: 14 x 4
# Groups:   group [2]
#   time                group  is.5 spell
#   <dttm>              <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A         0     0
# 2 2018-10-07 01:40:00 A         1     1
# 3 2018-10-07 01:41:00 A         1     1
# 4 2018-10-07 01:42:00 A         0     0
# 5 2018-10-07 01:43:00 A         1     2
# 6 2018-10-07 01:44:00 A         0     0
# 7 2018-10-07 01:45:00 A         0     0
# 8 2018-10-07 01:46:00 A         1     3
# 9 2018-05-20 14:00:00 B         0     0
#10 2018-05-20 14:01:00 B         0     0
#11 2018-05-20 14:02:00 B         1     1
#12 2018-05-20 14:03:00 B         1     1
#13 2018-05-20 14:04:00 B         0     0
#14 2018-05-20 14:05:00 B         1     2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := {
  r <- rle(is.5)
  r$values <- cumsum(r$values) * r$values
  inverse.rle(r) 
  }, by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
#  lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
#  values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'

library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
       ][!!spell, spell := match(spell, unique(spell))][]
#                   time group is.5 spell
# 1: 2018-10-07 01:39:00     A    0     0
# 2: 2018-10-07 01:40:00     A    1     1
# 3: 2018-10-07 01:41:00     A    1     1
# 4: 2018-10-07 01:42:00     A    0     0
# 5: 2018-10-07 01:43:00     A    1     2
# 6: 2018-10-07 01:44:00     A    0     0
# 7: 2018-10-07 01:45:00     A    0     0
# 8: 2018-10-07 01:46:00     A    1     3
# 9: 2018-05-20 14:00:00     B    0     0
#10: 2018-05-20 14:01:00     B    0     0
#11: 2018-05-20 14:02:00     B    1     1
#12: 2018-05-20 14:03:00     B    1     1
#13: 2018-05-20 14:04:00     B    0     0
#14: 2018-05-20 14:05:00     B    1     2

Or after the first step, use .GRP

df[!!spell, spell := .GRP, spell]

Identify and count spells (Distinctive events within each group)

Tags:

R

Grouping

Dataframe

Time Series

Dplyr

Related

Recent Posts