Find groups of overlapping intervals with data.table
DT[ , g := cumsum(
cummax(shift(Interval.end, fill = Interval.end[1])) < Interval.start) + 1]
# Interval.id Interval.start Interval.end Wanted.column g
# 1: 1 2.0 4.5 1 1
# 2: 2 3.0 3.5 1 1
# 3: 3 4.0 4.8 1 1
# 4: 4 4.6 5.0 1 1
# 5: 5 4.7 4.9 1 1
# 6: 6 5.5 8.0 2 2
Credit to highly related answers: Collapse rows with overlapping ranges, How to flatten / merge overlapping time periods
You can first create a data.table with the unique/grouped intervals, and then use foverlaps()
to perform a join.
The main-interval data.table can be created using the intervals
-package. Use the interval_union()
-function to 'merge' intervals into non-overlapping inertvals.
#use the intervals-package to create the "main" unique intervals
library( intervals )
DT.int <- as.data.table(
intervals::interval_union(
intervals::Intervals( as.matrix( DT[, 2:3] ) ) ,
check_valid = TRUE ) )
#set names
setnames( DT.int, names(DT.int), c("start", "end" ) )
#set group_id-column
DT.int[, group_id := .I ][]
# start end group_id
# 1: 2.0 5 1
# 2: 5.5 8 2
#now perform foverlaps()
setkey( DT, Interval.start, Interval.end)
setkey( DT.int, start, end)
foverlaps( DT.int, DT )
# Interval.id Interval.start Interval.end Wanted.column start end group_id
# 1: 1 2.0 4.5 1 2.0 5 1
# 2: 2 3.0 3.5 1 2.0 5 1
# 3: 3 4.0 4.8 1 2.0 5 1
# 4: 4 4.6 5.0 1 2.0 5 1
# 5: 5 4.7 4.9 1 2.0 5 1
# 6: 6 5.5 8.0 2 5.5 8 2
As you can see, the column group_id
matches your Wanted.column