How to choose the clustered index in SQL Server?

According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

  • Unique
  • Narrow
  • Static

And if you can also guarantee:

  • Ever-increasing pattern

then you're pretty close to having your ideal clustering key!

Check out her entire blog post here, and another really interesting one about clustering key impacts on table operations here: The Clustered Index Debate Continues.

Anything like an INT (esp. an INT IDENTITY) or possibly an INT and a DATETIME are ideal candiates. For other reasons, GUID's aren't good candidates at all - so you might have a GUID as your PK, but don't cluster your table on it - it'll be fragmented beyond recognition and performance will suffer.


A best candidate for a CLUSTERED index is the key you use to refer to your records most often.

Usually, this is a PRIMARY KEY, since it's what used in searches and/or FOREIGN KEY relationships.

In your case, Orders.ID will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.

If you create the CLUSTERED index on Orders.CustomerID, the following things will happen:

  1. CustomerID is not unique. To ensure uniqueness, a special hidden 32-bit column known as uniquifier will be added to each record.

  2. Records in the table will be stored according to this pair of columns (CustomerID, uniquifier).

  3. A secondary index on Order.ID will be created, with (CustomerID, uniquifier) as the record pointers.

  4. Queries like this:

    SELECT  *
    FROM    Orders
    WHERE   ID = 1234567
    

    will have to do an external operation, a Clustered Seek, since not all columns are stored in the index on ID. To retrieve all columns, the record should first be located in the clustered table.

This additional operation requires IndexDepth as many page reads as a simple Clustered Seek, the IndexDepth beign O(log(n)) of total number of the records in your table.


If you're concerned about clustering it's usually to help improve data retrieval. In you example, you're probably going to want all records for a given customer at once. Clustering on customerID will keep those rows on the same physical page rather than scattered throughout multiple pages in your file.

ROT: Cluster on what you want to show a collection of. Line items in a purchase order is the classic example.