Speed up an aggregate query on an 11 million row table
As written you technically haven't asked a question here. I assume that you want to improve the performance of your query, but keep in mind that defining an acceptable response time is sometimes an important part of performance tuning. If a query runs once per day and takes a minute to finish is it really worth 8 hours of your time to make it run in 1 second?
More important than performance is correctness. It doesn't matter so much how long the query takes if it returns the wrong results, although of course taking a long time to return the wrong results is worse than taking a short time to return the wrong results. Depending on your time zone the UTC conversion stuff might not work out as you expected it. If there's any data affected by daylight savings time then you can't use the current hour difference between local time and UTC time to convert old data.
Setting all of that aside, I'm going to try to show you a few ways to speed up the query in the question. You have a covering index which is a good start, especially because it avoids reading the unrelated blob data. However there are still ways to speed up the query. I'm deliberately ignoring the clues that you gave about data distribution because I want to make this a more general answer that could help others and perhaps could be more helpful to you if your data changes in the future.
I mocked up 10 million rows with half of them having "OK" for the message and the other half having a long string. The dates are spread out over a few years. WARNING: this code takes up around 60 GB of space and ran in around 10 minutes on my machine.
CREATE TABLE [dbo].[NotificationResult]
(
[IdNotificationResult] [bigint] IDENTITY(1,1) NOT NULL,
[Message] [varchar](max) NULL,
[DateCreated] [datetime] NOT NULL,
[DateCreatedUTC] [date] NOT NULL,
[Filler] VARCHAR(1000) NOT NULL,
CONSTRAINT [PK_NotificationResult] PRIMARY KEY CLUSTERED
(
[IdNotificationResult]
)
);
INSERT INTO [dbo].[NotificationResult] WITH (TABLOCK) ([Message], [DateCreated], [DateCreatedUTC], [Filler])
SELECT CASE WHEN RN % 2 = 1 THEN 'OK' ELSE REPLICATE('Z', 3000) END
, DATEADD(SECOND, 11 * RN, '20140101')
, CAST(DATEADD(SECOND, 11 * RN, '20140101') AS DATE)
, REPLICATE('FILLER', 166)
FROM
(
SELECT TOP (10000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
) t;
CREATE NONCLUSTERED INDEX [IX_NotificationResult_DateMessage]
ON [dbo].[NotificationResult] ( [DateCreated] ASC ) INCLUDE ( [Message]);
If I run the query in the question I get the same query plan as you. It took 38 seconds. Here are some performance stats for the execution:
Table 'NotificationResult'. Scan count 5, logical reads 2509562, physical reads 0, read-ahead reads 2499799
SQL Server Execution Times: CPU time = 20626 ms, elapsed time = 37663 ms.
The first opportunity to improve performance is that you have an implied WHERE
clause predicate in your CASE
statements. The query optimizer isn't smart enough to realize that any rows from previous years won't contribute to the totals. We know that 24 hours will always be longer than the difference between local time and UTC time so adding a filter like this shouldn't change the results:
WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
Now instead of reading and aggregating 10 million rows from the index, SQL Server only has to process 1.4 million rows. The savings that you get with this optimization will depend on how the data is distributed in the plan. If all of your data is in the current year then performance won't improve yet. For my data, the query now finishes in 5 seconds and performance is greatly improved:
Table 'NotificationResult'. Scan count 5, logical reads 352033, physical reads 1, read-ahead reads 350073
SQL Server Execution Times: CPU time = 3062 ms, elapsed time = 5354 ms.
We can do better than that. We're storing a VARCHAR(MAX)
column in the index when really all that we need to know is if the column value matches "OK" or not. Without changing the table definition we can create smaller indexes to seek or scan against by creating three filtered indexes:
CREATE NONCLUSTERED INDEX [IX_NotificationResult_Date_OK]
ON [dbo].[NotificationResult] ( [DateCreated] ASC )
WHERE [Message] = 'OK';
CREATE NONCLUSTERED INDEX [IX_NotificationResult_Date_NOT_OK]
ON [dbo].[NotificationResult] ( [DateCreated] ASC )
WHERE [Message] <> 'OK';
CREATE NONCLUSTERED INDEX [IX_NotificationResult_Date_NULL]
ON [dbo].[NotificationResult] ( [DateCreated] ASC )
WHERE [Message] IS NULL;
The idea here is that these indexes have the data that we need but are much smaller on disk than the existing IX_NotificationResult_DateMessage
index. Getting the query optimizer to use the filtered indexes required a query rewrite and an index hint (not sure why). Here's one way to rewrite the query:
SELECT
sum(case when FlagDTD = 1 then Success else 0 end) as SuccessDTD
, sum(case when FlagDTD = 1 then [Error] else 0 end) as ErrorDTD
, round(sum(case when FlagDTD = 1 then Success else 0 end) * 100.0 / sum(FlagDTD),2)
as RateDTD
, sum(case when FlagYTD = 1 then Success else 0 end) as SuccessYTD
, sum(case when FlagYTD = 1 then [Error] else 0 end) as ErrorYTD
, round(sum(case when FlagYTD = 1 then Success else 0 end) * 100.0 / sum(FlagYTD),2)
as RateYTD
FROM
(
SELECT
Success
, [Error]
, CASE WHEN DateCreated >
dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1, DATEADD(yy,
DATEDIFF(yy,0,getdate()), 0)) then 1 else 0 end as FlagYTD
, CASE WHEN DateCreated >
dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1 ,
convert(varchar(10), getdate(), 101)) then 1 else 0 end as FlagDTD
FROM
(
SELECT 1 Success, 0 Error, DateCreated
FROM
[dbo].[NotificationResult] WITH (INDEX (IX_NotificationResult_Date_OK))
WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
AND [Message] = 'OK'
UNION ALL
SELECT 0 Success, 1 Error, DateCreated
FROM
[dbo].[NotificationResult] WITH (INDEX (IX_NotificationResult_Date_NOT_OK))
WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
AND [Message] <> 'OK'
UNION ALL
SELECT 0 Success, 1 Error, DateCreated
FROM
[dbo].[NotificationResult] WITH (INDEX (IX_NotificationResult_Date_NULL))
WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
AND [Message] IS NULL
) t
) Cnts;
Now the query finishes in less than a second:
Table 'NotificationResult'. Scan count 10, logical reads 3874, physical reads 0, read-ahead reads 0
SQL Server Execution Times: CPU time = 2499 ms, elapsed time = 890 ms.
(the plan below is missing one of the indexes)
It's true that we read more rows from the indexes than before but the indexes are in total about 100 times smaller than the original index.
If that query still isn't fast enough you can consider an indexed view. If you have a UTC date column in the table then it is straightforward to create a view which is eligible to be indexed:
CREATE VIEW [NotificationResult_indexed]
WITH SCHEMABINDING
AS
SELECT
[DateCreatedUTC]
, COUNT_BIG(*) AS CNT_BIG
, SUM(CASE WHEN Message = 'OK' then 1 else 0 end) as Success
, SUM(CASE WHEN Message IS NULL OR Message <> 'OK' then 1 else 0 end) as [Error]
FROM dbo.[NotificationResult]
GROUP BY [DateCreatedUTC];
CREATE UNIQUE CLUSTERED INDEX CLU_NotificationResult_indexed
ON [NotificationResult_indexed] ([DateCreatedUTC]);
GO
I believe that this query roughly captures your intent although I probably got some details wrong:
SELECT
sum(case when FlagDTD = 1 then Success else 0 end) as SuccessDTD
, sum(case when FlagDTD = 1 then [Error] else 0 end) as ErrorDTD
, round(sum(case when FlagDTD = 1 then Success else 0 end) * 100.0 / sum(FlagDTD),2)
as RateDTD
, sum(case when FlagYTD = 1 then Success else 0 end) as SuccessYTD
, sum(case when FlagYTD = 1 then [Error] else 0 end) as ErrorYTD
, round(sum(case when FlagYTD = 1 then Success else 0 end) * 100.0 / sum(FlagYTD),2)
as RateYTD
FROM
(
SELECT
Success
, [Error]
, CASE WHEN [DateCreatedUTC] > dateadd(YEAR, datediff(YEAR, 0, getdate()), 0)
then 1 else 0 end as FlagYTD
, CASE WHEN [DateCreatedUTC] > CAST(GETDATE() AS DATE)
then 1 else 0 end as FlagDTD
FROM
[dbo].[NotificationResult_indexed]
WHERE [DateCreatedUTC] > dateadd(YEAR, datediff(YEAR, 0, getdate()), 0)
) Cnts;
It finishes in 66 ms:
Table 'NotificationResult_indexed'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 66 ms.