Select all overlapping ranges from starting range
A straightforward recursive solution splits the task in two parts:
- Find the earliest start date by following the chain of overlapping ranges
- Find the latest end date by following the chain of overlapping ranges
Details follow:
1. Follow the chain of overlaps leading to the earliest start date
- Find the starting range
- Find the overlapping range with the latest earlier start date than current
- Go to step 2
Example code:
DECLARE @STARTDATE DATETIME = '2018-08-24 12:00:00';
DECLARE @ENDDATE DATETIME = '2018-08-31 12:00:00';
WITH MinStart AS
(
-- Starting range
SELECT TOP (1)
D.ID,
D.STARTDATE
FROM dbo.DATA1 AS D
WHERE
D.STARTDATE >= @STARTDATE
AND D.ENDDATE <= @ENDDATE
ORDER BY
D.ID ASC
UNION ALL
SELECT
P1.ID,
P1.STARTDATE
FROM
(
-- Overlapping ranges with an earlier start date
-- Numbered starting with the highest start date
-- Tie-break dates using ID
SELECT
RN = ROW_NUMBER() OVER (
ORDER BY D.STARTDATE DESC, D.ID ASC),
D.ID,
D.STARTDATE
FROM MinStart AS R
JOIN dbo.DATA1 AS D
ON D.ENDDATE >= R.STARTDATE
AND D.STARTDATE < R.STARTDATE
) AS P1
WHERE
-- Highest start date only
P1.RN = 1
)
SELECT
MS.ID,
MS.STARTDATE
FROM MinStart AS MS
OPTION (MAXRECURSION 0);
2. Follow the chain of overlaps leading to the latest end date
- Find the starting range
- Find the overlapping range with the earliest later end date than current
- Go to step 2
Example code:
DECLARE @STARTDATE DATETIME = '2018-08-24 12:00:00';
DECLARE @ENDDATE DATETIME = '2018-08-31 12:00:00';
WITH MaxEnd AS
(
-- Starting range
SELECT TOP (1)
D.ID,
D.ENDDATE
FROM dbo.DATA1 AS D
WHERE
D.STARTDATE >= @STARTDATE
AND D.ENDDATE <= @ENDDATE
ORDER BY
D.ID ASC
UNION ALL
SELECT
P1.ID,
P1.ENDDATE
FROM
(
-- Overlapping ranges with a later end date
-- Numbered starting with the lowest end date
-- Tie-break dates using ID
SELECT
RN = ROW_NUMBER() OVER (
ORDER BY D.ENDDATE ASC, D.ID ASC),
D.ID,
D.ENDDATE
FROM MaxEnd AS R
JOIN dbo.DATA1 AS D
ON D.ENDDATE > R.ENDDATE
AND D.STARTDATE < R.ENDDATE
) AS P1
WHERE
-- Lowest end date only
P1.RN = 1
)
SELECT
ME.ID,
ME.ENDDATE
FROM MaxEnd AS ME
OPTION (MAXRECURSION 0);
With suitable indexing, both queries use an execution plan like:
The results given the sample data provided are:
╔════╦═════════════════════════╗ ║ ID ║ STARTDATE ║ ╠════╬═════════════════════════╣ ║ 3 ║ 2018-08-24 12:00:00.000 ║ ║ 6 ║ 2018-08-23 12:00:00.000 ║ ║ 2 ║ 2018-08-22 12:00:00.000 ║ ╚════╩═════════════════════════╝ ╔════╦═════════════════════════╗ ║ ID ║ ENDDATE ║ ╠════╬═════════════════════════╣ ║ 3 ║ 2018-08-29 12:00:00.000 ║ ║ 6 ║ 2018-08-30 12:00:00.000 ║ ║ 4 ║ 2018-09-02 12:00:00.000 ║ ╚════╩═════════════════════════╝
So IDs 2, 3, 4, and 6 were used. The final range is 2018-08-22 12:00:00.000
(lowest start date found) to 2018-09-02 12:00:00.000
(highest end date found).
Note: The code above includes a tie-break on dates (using the ID
column) due to duplicates in the sample data. This may, or may not, be required to solve the real problem.
The indexes used were:
CREATE TABLE dbo.DATA1
(
ID integer PRIMARY KEY,
STARTDATE datetime NOT NULL,
ENDDATE datetime NOT NULL,
);
CREATE UNIQUE INDEX i1 ON dbo.DATA1 (STARTDATE DESC, ID ASC) INCLUDE (ENDDATE);
CREATE UNIQUE INDEX i2 ON dbo.DATA1 (ENDDATE ASC, ID DESC) INCLUDE (STARTDATE);
Demo: db<>fiddle
Note that truly optimal indexing for these types of queries can be difficult. Even so, the approach above often performs well for many practical tasks in this area.
Until proper support for interval queries and predicates is added to SQL Server, higher performance solutions require significant extra work. See:
- Interval Queries in SQL Server Part 4 by Dejan Sarka
- Interval Queries in SQL Server by Itzik Ben-Gan