Merging date intervals in SQL Server
Here's a query that performs best of all submissions so far, with only two table accesses in the execution plan (instead of three or more). All queries are of course helped by indexes. Please note that the execution plan rates this query as more expensive, but the actual Reads & CPU are significantly better. Estimated costs in execution plans are not the same as actual performance.
WITH Grps AS (
SELECT
(Row_Number() OVER (ORDER BY P1.StartDate) - 1) / 2 Grp,
P1.StartDate,
P1.EndDate
FROM
Periods P1
CROSS JOIN (SELECT -1 UNION ALL SELECT 1) D (Dir)
LEFT JOIN Periods P2 ON
DateAdd(Day, D.Dir, P1.StartDate) = P2.EndDate
OR DateAdd(Day, D.Dir, P1.EndDate) = P2.StartDate
WHERE
(Dir = -1 AND P2.EndDate IS NULL)
OR (Dir = 1 AND P2.StartDate IS NULL)
)
SELECT
Min(StartDate) StartDate,
Max(EndDate) EndDate
FROM Grps
GROUP BY Grp;
One more thing I think worth mentioning is that querying your date period table would all around in most cases be simpler and better performing if you used exclusive end dates (aka "open" end dates) instead of closed ones:
StartDate | EndDate | EndDate
(Inclusive) | (Inclusive) | (Exclusive)
---------------------------------------
1982.03.02 | 1982.09.30 | 1982.10.01
1982.10.01 | 1985.01.17 | 1985.01.18
Using exclusive end dates is (in my opinion) best practice most of the time because it allows you to change the data type of the date column or to change the resolution of the date, without affecting any queries, code, or other logic. For example, if your dates needed to be to the nearest 12 hours instead of 24 hours, you'd have major work to get that accomplished, whereas if you used exclusive end dates not a single thing would have to change!
If you were using exclusive end dates, my query would look like this:
WITH Grps AS (
SELECT
(Row_Number() OVER (ORDER BY P1.StartDate) - 1) / 2 Grp,
P1.StartDate,
P1.EndDate
FROM
Periods P1
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) X (Which)
LEFT JOIN Periods P2 ON
(X.Which = 1 AND P1.StartDate = P2.EndDate)
OR (X.Which = 2 AND P1.EndDate = P2.StartDate)
WHERE
P2.EndDate IS NULL
OR P2.StartDate IS NULL
)
SELECT
Min(StartDate) StartDate,
Max(EndDate) EndDate
FROM Grps
GROUP BY Grp;
Notice there's no DateAdd or DateDiff now, with hardcoded values of "1 Day" that would have to change if you for example switched to 12-hour periods.
Update
Here's an updated query that incorporates things I've learned in the last almost 5 years. This query now has no joins at all, and though it does have 3 sort operations in it which could be performance problems, I think this query will compete reasonably well, and in the absence of indexes will probably beat all others hands down.
WITH Groups AS (
SELECT Grp = Row_Number() OVER (ORDER BY StartDate) / 2, *
FROM
#Periods
(VALUES (0), (0)) X (Dup)
), Ranges AS (
SELECT StartDate = Max(StartDate), EndDate = Min(EndDate)
FROM Groups
GROUP BY Grp
HAVING Max(StartDate) <> DateAdd(day, 1, Min(EndDate))
), ReGroups AS (
SELECT
Grp = Row_Number() OVER (ORDER BY StartDate) / 2,
StartDate,
EndDate
FROM
Ranges
CROSS JOIN (VALUES (0), (0)) X (Dup)
)
SELECT
StartDate = Min(StartDate),
EndDate = Max(EndDate)
FROM ReGroups
GROUP BY Grp
HAVING Count(*) = 2
;
And here's yet another version using windowing functions (kind of what the previous query is simulating):
WITH LeadLag AS (
SELECT
PrevEndDate = Coalesce(Lag(EndDate) OVER (ORDER BY StartDate), '00010101'),
NextStartDate = Coalesce(Lead(StartDate) OVER (ORDER BY StartDate), '99991231'),
*
FROM #Periods
), Dates AS (
SELECT
X.*
FROM
LeadLag
CROSS APPLY (
SELECT
StartDate = CASE WHEN DateAdd(day, 1, PrevEndDate) <> StartDate THEN StartDate ELSE NULL END,
EndDate = CASE WHEN DateAdd(day, 1, EndDate) <> NextStartDate THEN EndDate ELSE NULL END
) X
WHERE
X.StartDate IS NOT NULL
OR X.EndDate IS NOT NULL
), Final AS (
SELECT
StartDate,
EndDate = Min(EndDate) OVER (ORDER BY EndDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM Dates
)
SELECT *
FROM Final
WHERE StartDate IS NOT NULL
;
It takes longer for me to set up the sample data than to write the query - it would be better if you posted questions that include CREATE TABLE
and INSERT/SELECT
statements. I don't know what your table is called, I've called mine Periods:
create table Periods (
StartDate date not null,
EndDate date not null
)
go
insert into Periods(StartDate,EndDate)
select '19820302','19820930' union all
select '19821001','19850117' union all
select '19850626','19850726' union all
select '19850730','19911231' union all
select '19920101','19951231' union all
select '19960101','20040531' union all
select '20040605','20060131' union all
select '20060201','20110520'
go
; with MergedPeriods as (
Select p1.StartDate, p1.EndDate
from
Periods p1
left join
Periods p2
on
p1.StartDate = DATEADD(day,1,p2.EndDate)
where
p2.StartDate is null
union all
select p1.StartDate,p2.EndDate
from
MergedPeriods p1
inner join
Periods p2
on
p1.EndDate = DATEADD(day,-1,p2.StartDate)
)
select StartDate,MAX(EndDate) as EndDate
from MergedPeriods group by StartDate
Result:
StartDate EndDate
1982-03-02 1985-01-17
1985-06-26 1985-07-26
1985-07-30 2004-05-31
2004-06-05 2011-05-20