How to flatten / merge overlapping time periods
Here's a possible solution. The basic idea here is to compare lagged start
date with the maximum end date "until now" using the cummax
function and create an index that will separate the data into groups
data %>%
arrange(ID, start) %>% # as suggested by @Jonno in case the data is unsorted
group_by(ID) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(ID, indx) %>%
summarise(start = first(start), end = last(end))
# Source: local data frame [3 x 4]
# Groups: ID
#
# ID indx start end
# 1 A 0 2013-01-01 2013-01-06
# 2 A 1 2013-01-07 2013-01-11
# 3 A 2 2013-01-12 2013-01-15
@David Arenburg's answer is great - but I ran into an issue where an earlier interval ended after a later interval - but using last
in the summarise
call resulted in the wrong end date. I'd suggest changing first(start)
and last(end)
to min(start)
and max(end)
data %>%
group_by(ID) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(ID, indx) %>%
summarise(start = min(start), end = max(end))
Also, as @Jonno Bourne mentioned, sorting by start
and any grouping variables is important before applying the method.