Identify and count spells (Distinctive events within each group)
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
You asked for a tidyverse
solution but if speed is your concern, you might use data.table
. The syntax is very similar
library(data.table)
setDT(df)[, spell := {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}, by = group][] # the [] at the end prints the data.table
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]