Clustered Index Selection - PK or FK?
Community wiki answer:
I think a composite clustered index key with CustomerID as the first column will be best since that's in the WHERE
clause of nearly all queries.
There may be more splits compared to an incremental key (or more likely suboptimal page density for a time if you manage and maintain fill factor to avoid 'bad' splits). However, the overall performance improvement for customer queries is substantial, because the key lookup is avoided.
OrderID or OrderDate may be best for the second column depending on your most critical queries.
For example, if customers see a chronological list of recent orders after logging in to a web site, OrderDate should be next, to optimize ORDER BY OrderDate DESC
.
If you choose OrderID as the clustered index, with a non-clustered index on CustomerID, you'll still get splits and fragmentation, just in the non-clustered index.
If this table is heavily write intensive (e.g. many more INSERT
statements are occurring rather than SELECT
statements against it), I'm going to disagree with the wiki answer.
Choosing CustomerID as the first column of a composite clustered key is going to generate a lot of mid-page splits. You hopefully have lots of existing customers and also get many new customers all the time. Because customers are (hopefully) placing multiple orders as your business continues to grow, this approach will exhibit a fair amount of mid-page splits that are going to kill performance not only on writes, but also reads as your indexes will be both heavily fragmented and likely contain higher amounts of white space (which means wasted storage and memory).
If you feel CustomerID should be a leading column of a composite clustered index, you can reduce the impact of the mid-page splits by adjusting FILLFACTOR
on all indexes for this table. This will decrease the amount of mid-page splits by increasing the size of the table/index. If you want to go this route, I'd suggest testing with a value of 80 and reduce if analysis reveals mid-page splits are still killing performance.
My suggestion is to use OrderId. OrderID should naturally be sequential and generate more of the end-page splits which are good and expected with table growth. Additionally this approach will play better with Table Partitioning if you choose to use the OrderDate column as a partition key. Regarding queries that constantly use the CustomerID field, create a nonclustered index to handle those queries. This index would need to be defined with the proper FILLFACTOR
as it will suffer from mid-page splits that I mention above, though these won't be as bad overall in contrast to if the splits were occurring against the clustered index.
The activity on our DB is about 85% reads and 15% writes.
CustomerID
+ OrderID
(and specifying a fillfactor to allow for growth without splits) is probably better if that assessment holds true. Just make sure that assessment is accurate. Test test test.