Should I avoid adding a clustered key to a table with high throughput
The main reason I would want a clustered index in this scenario is this line:
The process for deleting process.Audit records will execute every hour and delete an hours worth of audits from x days ago (typically around 7 days)
When you delete rows from a HEAP, data pages may not be deallocated unless the delete gets a table lock, or you provide a WITH (TABLOCK)
hint to the delete query. You can probably imagine what that does to concurrency, though. Not good.
Note that the TABLOCK
hint will not have this behavior if you're using RCSI or Snapshot Isolation.
Here's a quick example. Load up a small table:
USE tempdb;
SET NOCOUNT ON;
CREATE TABLE dbo.heap
(
id INT PRIMARY KEY NONCLUSTERED,
junk VARCHAR(1000)
);
INSERT dbo.heap (
id, junk )
SELECT TOP 1000 x.n, REPLICATE('A', x.n % 1000)
FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY @@ROWCOUNT ) AS n
FROM sys.messages AS m ) AS x;
Run a sanity check query to figure out how many pages are assigned to the heap, and to the nonclustered PK:
SELECT OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
MAX(a.used_pages) AS leaf_me_alone
FROM sys.indexes AS i
JOIN sys.partitions AS p
ON p.object_id = i.object_id
AND p.index_id = i.index_id
JOIN sys.allocation_units AS a
ON a.container_id = p.partition_id
WHERE OBJECT_NAME(i.object_id) = 'heap'
GROUP BY i.object_id, i.index_id, i.name
ORDER BY OBJECT_NAME(i.object_id), i.index_id;
Results in this:
table_name index_name leaf_me_alone
heap NULL 74
heap PK__heap__ 7
So, 74 pages in the heap, 7 pages in the NC PK.
Do some singleton deletes to clear out the table:
DECLARE @i INT = 1;
WHILE @i < 1000
BEGIN
DELETE h
FROM dbo.heap AS h
WHERE h.id = @i;
SET @i += 1;
PRINT @i;
END;
If you re-run the sanity check query, you'll get the same result.
Worse, if you query the table now, SQL will read ALL OF THOSE BLANK PAGES!
SET STATISTICS TIME, IO ON
SELECT *
FROM dbo.heap AS h;
Table 'heap'. Scan count 1, logical reads 67
So now not only is our table artificially large, but SQL now has a bunch of blank pages on disk and in memory and in backups and in DBCC CHECKDB and... well, you get the point.
We're looking at around 1.5m audit records going through this process every hour
Heh heh heh! No fun.
Other options for getting pages deallocated from the heap are:
TRUNCATE TABLE dbo.heap
Which doesn't work for you, because you need to batch delete data.
ALTER TABLE dbo.heap REBUILD;
Which would be painful for you at that table size, because it will rebuild all nonclustered indexes on the table at the same time.
Will the table re-use pages? Sometimes maybe sorta kinda.
DECLARE @id_max INT = (SELECT MAX(id) FROM dbo.heap AS h);
INSERT dbo.heap (
id, junk )
SELECT TOP 5000 x.n + @id_max, REPLICATE('A', x.n % 1000)
FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY @@ROWCOUNT ) AS n
FROM sys.messages AS m ) AS x;
Sanity check:
table_name index_name leaf_me_alone
heap NULL 400
heap PK__heap__ 20
SELECT * query:
Table 'heap'. Scan count 1, logical reads 392
Hope this helps!