The column order of composite clustered primary key on a partitioned table?
Which of the following primary keys (the column order is different) is preferred?
Like all indexing decisions, much depends on how the table will be queried.
All partitioned indexes (for SQL Server 2008 and later) have the partition ID (not partitioning key value) as a hidden leading key column in each partitioned index, so the effective competing definitions are:
PartitionID, K1, K2 vs. PartitionID, K2, K1
This affects the utility of each index for different types of queries, as one would expect. The main extra consideration is that inequality seeks on the first real key (K1 or K2) are still supported, regardless of any inequality seek and/or partition elimination operations on the PartitionID column.
For example, the (K1, K2) index specification can seek to a range of partitions, and a range of K1 values simultaneously:
SELECT T1.*
FROM dbo.T1 AS T1
WHERE 1 = 1
AND T1.K1 >= CONVERT(date, '20080711', 112)
AND T1.K1 <= CONVERT(date, '20100711', 112);
The same query on a table with (K2, K1) as the clustered index key could seek find the range of partitions, but it would have to fully scan each qualifying partition to locate rows that match the K1 predicates exactly. To be clear, the test of K1 values would be applied as a residual predicate, not as a seeking operation.
This will show as a Clustered Index Scan in showplan, with a partition-eliminating seek, and residual predicate on K1 values:
A subtle point when using the date data type as the partitioning key is that you must be careful to use explicit date types in your queries if you expect partition elimination to occur reliably. Using other types, such as datetime, is easily done (by accident) but will often prevent elimination, where it would logically be expected.
For example, this query will touch all partitions:
DECLARE @dt datetime = '20080711';
SELECT *
FROM dbo.T1 AS T1
WHERE T1.K1 = @dt;
Whereas this query will touch only a single partition:
DECLARE @dt date = '20080711';
SELECT *
FROM dbo.T1 AS T1
WHERE T1.K1 = @dt;
Both queries look superficially identical in graphical showplan (a clustered index seek). You need to check the operator properties in detail to check if static or dynamic partition elimination is being applied.
For the example join query given in the question: Both indexing strategies include the K2 column, but neither can generally provide rows in K2 order without a sort. As a result, either index is equally good for a hash or nested loops join, but neither can provide the input order required for a merge join on K2.
This might seem counter-intuitive for the (K2, K1) index, but remember the leading PartitionID key. Each partition has rows in (K2, K1) order. Unless exactly one partition is specified in the query, a sort will be necessary to return rows in K2 order. The (K1, K2) index could only return rows in K2 order for a single partition and a single given value of K1.
The proposed clustered primary key (K1, K2) has the potential advantage of minimizing base table page splits if appended data is actually sorted by the clustering key during the insert operation. For the (K1, K2) index, this would mean rows sorted by (PartitionID, K1, K2). For (K2, K1), it would be (PartitionID, K2, K1).
Related reading: Query Processing Enhancements on Partitioned Tables and Indexes
Since you are making your primary key clustered, if you look in this Technet article on the aligning indexes with partitions it mentions under the Clustered indexes section that if you do not include the partitioning column in the clustered index it will be done for you. Both of your options do include the partitioning column so that isn't an issue but should probably be kept in mind.
All of that being said everything nothing I've read says it makes any difference as long as the clustered index contains the partitioning column (making it aligned). Personally I would probably put the partitioned column first. In my mind that would seem to allow SQL to decide which partition to look in before looking at any other columns in the index.
Based on the information given, there is no real need to have a separate PK and CIX in this case. This of course assumes that the combination of K1, K2 is unique.