Clustered index - multi-part vs single-part index and effects of inserts/deletes
Yes, inserting into the middle of an existing table (or its page) could be expensive when you have a less than optimal clustered index. Worst case would be a page split : half the rows on the page would have to be moved elsewhere, and indices (including non-clustered indices on that table) need to be updated.
You can alleviate that problem by using the right clustered index - one that ideally is:
- narrow (only a single field, as small as possible)
- static (never changes)
- unique (so that SQL Server doesn't need to add 4-byte uniqueifiers to your rows)
- ever-increasing (like an INT IDENTITY)
You want a narrow key (ideally a single INT) since each and every entry in each and every non-clustered index will also contain the clustering key(s) - you don't want to put lots of columns in your clustering key, nor do you want to put things like VARCHAR(200) there!
With an ever increasing clustered index, you will never see the case of a page split. The only fragmentation you could encounter is from deletes ("swiss cheese" problem).
Check out Kimberly Tripp's excellet blog posts on indexing - most notably:
- GUIDs as PRIMARY KEYs and/or the clustering key
- The Clustered Index Debate Continues... - this one actually shows that a good clustered index will speed up all operations - including inserts, delete etc., compared to a heap with no clustered index!
- Ever-increasing clustering key - the Clustered Index Debate..........again!
Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:
That's definitely not necessary - if you have one index on (Name, Something)
, that index can also and just as well be used if you search and restrict on just WHERE Name = abc
- having a separate index with just the Name
column is totally not needed and only wastes space (and costs time to be kept up to date).
So basically, you only need a single index on (Name, Something)
, and I would agree with you - if you have no other indices on this table, then you should be able to make this the clustered key. Since that key won't be ever-increasing and could possibly change, too (right?), this might not be such a great idea.
The other option would be to introduce a surrogate ID INT IDENTITY
and cluster on that - with two benefits:
- it's all a good clustered key should be, including ever-increasing -> you'll never have any issues with page splits and performance for INSERT operations
- you still get all the benefits of having a clustering key (see Kim Tripps' blog posts - clustered tables are almost always preferable to heaps)