Index Uniqueness Overhead
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE
modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);
Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));
As you can see above, three rows contain the same value for the rowDate
column, and two rows contain unique values.
Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE
command:
DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered Indexes Dare
- SQL Server and Binary Search
- Performance Benefits of Unique Indexes
FYI, the mere presence of an identity
column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON;
statement will allow you to insert to non-unique values into a column defined as identity
.
Just an add-on to Max's excellent answer.
When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier
in the background anyways.
This Uniquifier
could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier
is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).
When this happens, you will receive this error:
The maximum system-generated unique value for a duplicate group
was exceeded for index with partition ID (someID).
Dropping and re-creating the index may resolve this;
otherwise, use another clustering key.
Error 666 (the above error) occurs when the uniquifier
for a single set of non-unique keys consumes more than 2,147,483,647 rows.
So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.