Do you know an easy way to generate one record for each hour of the past 12 hours?
For SQL Server 2005+ you can generate those 12 records very easily with a loop ar a recursive CTE. Here is an example of a recursive CTE:
DECLARE @Date DATETIME
SELECT @Date = '20120308 11:00:00'
;WITH Dates AS
(
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,@Date)) [Hour],
DATEADD(HOUR,-1,@Date) [Date], 1 Num
UNION ALL
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,[Date])),
DATEADD(HOUR,-1,[Date]), Num+1
FROM Dates
WHERE Num <= 11
)
SELECT [Hour], [Date]
FROM Dates
Then you just nedd to join it with your events table.
Tally tables can be used for things like this. They can be very efficient. Create the tally table below. I created the tally table with only 24 rows for your example, but you can create it with however many you want to suit other purposes.
SELECT TOP 24
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
I assumed your table was called dbo.tblEvents, run the query below. I believe this is what you're looking for:
SELECT t.n, count(e.EventTime)
FROM dbo.Tally t
LEFT JOIN dbo.tblEvent e on t.n = datepart(hh, e.EventTime)
GROUP BY t.n
ORDER BY t.n
I believe credit goes to the following links, I believe this is where I first came across this:
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://www.sqlservercentral.com/articles/T-SQL/74118/
First, my apologies for the delay in my response since my last comments.
The subject came up in the comments that using a Recursive CTE (rCTE from here on) runs fast enough because of the low number of rows. While it may appear that way, nothing could be further from the truth.
BUILD TALLY TABLE AND TALLY FUNCTION
Before we start testing, we need to build a physical Tally Table with the appropriate Clustered Index and an Itzik Ben-Gan style Tally Function. We'll also do all of this in TempDB so that we don't accidently drop anyone's goodies.
Here's the code to build the Tally Table and my current production version of Itzik's wonderful code.
--===== Do this in a nice, safe place that everyone has
USE tempdb
;
--===== Create/Recreate a Physical Tally Table
IF OBJECT_ID('dbo.Tally','U') IS NOT NULL
DROP TABLE dbo.Tally
;
-- Note that the ISNULL makes a NOT NULL column
SELECT TOP 1000001
N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,0)
INTO dbo.Tally
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N)
;
--===== Create/Recreate a Tally Function
IF OBJECT_ID('dbo.fnTally','IF') IS NOT NULL
DROP FUNCTION dbo.fnTally
;
GO
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Purpose:
Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.
As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.
Usage:
--===== Syntax example (Returns BIGINT)
SELECT t.N
FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;
Notes:
1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
Refer to the following URLs for how it works and introduction for how it replaces certain loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
will cause the sequence to start at 1.
3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
5. If @MaxN is negative or NULL, a "TOP" error will be returned.
6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
that many values, you should consider using a different tool. ;-)
7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending
sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.
DECLARE @MaxN BIGINT;
SELECT @MaxN = 1000;
SELECT DescendingN = @MaxN-N+1
FROM dbo.fnTally(1,@MaxN);
8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
Revision History:
Rev 00 - Unknown - Jeff Moden
- Initial creation with error handling for @MaxN.
Rev 01 - 09 Feb 2013 - Jeff Moden
- Modified to start at 0 or 1.
Rev 02 - 16 May 2013 - Jeff Moden
- Removed error handling for @MaxN because of exceptional cases.
Rev 03 - 22 Apr 2015 - Jeff Moden
- Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1) --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows
SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
UNION ALL
SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;
GO
By the way... notice that built a million and one row Tally Table and added a Clustered Index to it in about a second or so. Try THAT with an rCTE and see how long it takes! ;-)
BUILD SOME TEST DATA
We also need some test data. Yes, I agree that all of the functions that we're going to test, including the rCTE, run in a millisecond or less for only 12 rows but that's the trap that a lot of people fall into. We'll talk more about that trap later but, for now, lets simulate calling each function 40,000 times, which is about how many times certain functions in my shop get called in an 8 hour day. Just imagine how many times such functions might be called in a large online retail business.
So, here's the code to build 40,000 rows with random dates, each having a Row Number just for tracking purposes. I didn't take the time to make the times whole hours because it doesn't matter here.
--===== Do this in a nice, safe place that everyone has
USE tempdb
;
--===== Create/Recreate a Test Date table
IF OBJECT_ID('dbo.TestDate','U') IS NOT NULL
DROP TABLE dbo.TestDate
;
DECLARE @StartDate DATETIME
,@EndDate DATETIME
,@Rows INT
;
SELECT @StartDate = '2010' --Inclusive
,@EndDate = '2020' --Exclusive
,@Rows = 40000 --Enough to simulate an 8 hour day where I work
;
SELECT RowNum = IDENTITY(INT,1,1)
,SomeDateTime = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartDate,@EndDate)+@StartDate
INTO dbo.TestDate
FROM dbo.fnTally(1,@Rows)
;
BUILD SOME FUNCTIONS TO DO THE 12 ROW HOUR THING
Up next, I converted the rCTE code to a function and create 3 other functions. They've all been created as high performance iTVFs (Inline Table Valued Functions). You can always tell because iTVFs don't ever have a BEGIN in them like Scalar or mTVFs (Multi-statement Table Valued Functions) do.
Here's the code to build those 4 functions... I named them after the method they use and not what they do just to make it easier to identify them.
--===== CREATE THE iTVFs
--===== Do this in a nice, safe place that everyone has
USE tempdb
;
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.OriginalrCTE','IF') IS NOT NULL
DROP FUNCTION dbo.OriginalrCTE
;
GO
CREATE FUNCTION dbo.OriginalrCTE
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH Dates AS
(
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,@Date)) [Hour],
DATEADD(HOUR,-1,@Date) [Date], 1 Num
UNION ALL
SELECT DATEPART(HOUR,DATEADD(HOUR,-1,[Date])),
DATEADD(HOUR,-1,[Date]), Num+1
FROM Dates
WHERE Num <= 11
)
SELECT [Hour], [Date]
FROM Dates
GO
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.MicroTally','IF') IS NOT NULL
DROP FUNCTION dbo.MicroTally
;
GO
CREATE FUNCTION dbo.MicroTally
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT [Hour] = DATEPART(HOUR,DATEADD(HOUR,t.N,@Date))
,[DATE] = DATEADD(HOUR,t.N,@Date)
FROM (VALUES (-1),(-2),(-3),(-4),(-5),(-6),(-7),(-8),(-9),(-10),(-11),(-12))t(N)
;
GO
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.PhysicalTally','IF') IS NOT NULL
DROP FUNCTION dbo.PhysicalTally
;
GO
CREATE FUNCTION dbo.PhysicalTally
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT [Hour] = DATEPART(HOUR,DATEADD(HOUR,-t.N,@Date))
,[DATE] = DATEADD(HOUR,-t.N,@Date)
FROM dbo.Tally t
WHERE N BETWEEN 1 AND 12
;
GO
-----------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.TallyFunction','IF') IS NOT NULL
DROP FUNCTION dbo.TallyFunction
;
GO
CREATE FUNCTION dbo.TallyFunction
(@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT [Hour] = DATEPART(HOUR,DATEADD(HOUR,-t.N,@Date))
,[DATE] = DATEADD(HOUR,-t.N,@Date)
FROM dbo.fnTally(1,12) t
;
GO
BUILD THE TEST HARNESS TO TEST THE FUNCTIONS
Last but not least, we need a test harness. I do a baseline check and then test each function in an identical manner.
Here's the code for the test harness...
PRINT '--========== Baseline Select =================================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = RowNum
,@Date = SomeDateTime
FROM dbo.TestDate
CROSS APPLY dbo.fnTally(1,12);
SET STATISTICS TIME,IO OFF;
GO
PRINT '--========== Orginal Recursive CTE ===========================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.OriginalrCTE(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
PRINT '--========== Dedicated Micro-Tally Table =====================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.MicroTally(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
PRINT'--========== Physical Tally Table =============================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.PhysicalTally(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
PRINT'--========== Tally Function ===================================';
DECLARE @Hour INT, @Date DATETIME
;
SET STATISTICS TIME,IO ON;
SELECT @Hour = fn.[Hour]
,@Date = fn.[Date]
FROM dbo.TestDate td
CROSS APPLY dbo.TallyFunction(td.SomeDateTime) fn;
SET STATISTICS TIME,IO OFF;
GO
One thing to notice in the test harness above is that I shunt all output into "throwaway" variables. That's to try to keep performance measurements as pure as possible without any output to disk or screen skewing results.
A WORD OF CAUTION ON SET STATISTICS
Also, a word of caution for would-be testers... You MUST NOT use SET STATISTICS when testing either Scalar or mTVF functions. It can only be safely used on iTVF functions like the ones in this test. SET STATISTICS has been proven to make SCALAR functions run hundreds of times slower than they actually do without it. Yeah, I'm trying to tilt another windmill but that would be a whole 'nuther article-length post and I don't have the time for that. I have an article on SQLServerCentral.com talking all about that but there's no sense in posting the link here because someone will get all bent out of shape about it.
THE TEST RESULTS
So, here are the test results when I run the test harness on my little i5 laptop with 6GB of RAM.
--========== Baseline Select =================================
Table 'Worktable'. Scan count 1, logical reads 82309, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 203 ms, elapsed time = 206 ms.
--========== Orginal Recursive CTE ===========================
Table 'Worktable'. Scan count 40001, logical reads 2960000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4258 ms, elapsed time = 4415 ms.
--========== Dedicated Micro-Tally Table =====================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 235 ms.
--========== Physical Tally Table =============================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 252 ms.
--========== Tally Function ===================================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 253 ms.
The "BASELINE SELECT", which only selects data (each row created 12 times to simulate the same volume of return), came in right about 1/5th of a second. Everything else came in at about a quarter of a second. Well, everything except that bloody rCTE function. It took 4 and 1/4 seconds or 16 times longer (1,600% slower).
And look at the logical reads (memory IO)... The rCTE consumed a whopping 2,960,000 (almost 3 MILLION reads) whereas the other functions only consumed about 82,100. That means the rCTE consumed more than 34.3 times more memory IO than any of the other functions.
CLOSING THOUGHTS
Let's summarize. The rCTE method for doing this "small" 12 row thing used 16 TIMES (1,600%) more CPU (and duration) and 34.3 TIMES (3,430%) more memory IO than any of the other functions.
Heh... I know what you're thinking. "Big Deal! It's just one function."
Yeah, agreed, but how many other functions do you have? How many other places outside of functions do you have? And do you have any of those that work with more than just 12 rows each run? And, is there any chance that someone in a lurch for a method might copy that rCTE code for something much bigger?
Ok, time to be blunt. It makes absolutely no sense for people to justify performance challenged code just because of supposed limited row counts or usage. Except for when you purchase an MPP box for perhaps millions of dollars (not to mention the expense of rewriting code to get it to work on such a machine), you can't buy a machine that runs your code 16 times faster (SSD's won't do it either... all this stuff was in high speed memory when we tested it). Performance is in the code. Good performance is in good code.
Can you imagine if all of your code ran "just" 16 times faster?
Never justify bad or performance challenged code on low rowcounts or even low usage. If you do, you might have to borrow one of the windmills I was accused of tilting at to keep your CPUs and disks cool enough. ;-)
A WORD ON THE WORD "TALLY"
Yeah... I agree. Semantically speaking, the Tally Table contains numbers, not "tallies". In my original article on the subject (it wasn't the original article on the technique but it was my first on it), I called it "Tally" not because of what it contains, but because of what it does... it's used to "count" instead of looping and to "Tally" something is to "Count" something. ;-) Call it what you will... Numbers Table, Tally Table, Sequence Table, whatever. I don't care. For me, "Tally" is more meaning full and, being a good lazy DBA, contains only 5 letters (2 are identical) instead of 7 and it's easier to say for most folks. It's also "singular", which follows my naming convention for tables. ;-) It's also what the article that contained a page from a book from the 60's called it. I'll always refer to it as a "Tally Table" and you'll still know what I or someone else means. I also avoid Hungarian Notation like the plague but called the function "fnTally" so that I could say "Well, if you used the eff-en Tally Function I showed you, you wouldn't have a performance problem" without it actually being an HR violation. ;-)
What I'm more concerned about is people learning to use it properly instead of resorting to things like performance challenged rCTEs and other forms of Hidden RBAR.