How to choose the clustered index in SQL Server?
According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:
- Unique
- Narrow
- Static
And if you can also guarantee:
- Ever-increasing pattern
then you're pretty close to having your ideal clustering key!
Check out her entire blog post here, and another really interesting one about clustering key impacts on table operations here: The Clustered Index Debate Continues.
Anything like an INT (esp. an INT IDENTITY) or possibly an INT and a DATETIME are ideal candiates. For other reasons, GUID's aren't good candidates at all - so you might have a GUID as your PK, but don't cluster your table on it - it'll be fragmented beyond recognition and performance will suffer.
A best candidate for a CLUSTERED
index is the key you use to refer to your records most often.
Usually, this is a PRIMARY KEY
, since it's what used in searches and/or FOREIGN KEY
relationships.
In your case, Orders.ID
will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.
If you create the CLUSTERED
index on Orders.CustomerID
, the following things will happen:
CustomerID
is not unique. To ensure uniqueness, a special hidden32-bit
column known asuniquifier
will be added to each record.Records in the table will be stored according to this pair of columns
(CustomerID, uniquifier)
.A secondary index on
Order.ID
will be created, with(CustomerID, uniquifier)
as the record pointers.Queries like this:
SELECT * FROM Orders WHERE ID = 1234567
will have to do an external operation, a
Clustered Seek
, since not all columns are stored in the index onID
. To retrieve all columns, the record should first be located in the clustered table.
This additional operation requires IndexDepth
as many page reads as a simple Clustered Seek
, the IndexDepth
beign O(log(n))
of total number of the records in your table.
If you're concerned about clustering it's usually to help improve data retrieval. In you example, you're probably going to want all records for a given customer at once. Clustering on customerID will keep those rows on the same physical page rather than scattered throughout multiple pages in your file.
ROT: Cluster on what you want to show a collection of. Line items in a purchase order is the classic example.