How to determine if an Index is required or necessary
I use Jason Strate's index analysis scripts (Old location). They tell you how much your existing indexes are used as well as how much missing indexes would have been used. I typically don't add indexes unless they make up more than 5 or 10% of the queries on a table.
Most importantly, though, it's about making sure the application responds fast enough for the users.
Update: Jason Strate's index analysis blog articles for newer scripts (New location)
Double Update: These days, I use sp_BlitzIndex® when performing index analysis.
There are a few concepts and terms that are important to understand when dealing with indexes. Seeks, scans, and lookups are some of the ways that indexes will be utilized through select statements. Selectivity of key columns is integral to determining how effective an index can be.
A seek happens when the SQL Server Query Optimizer determines that the best way to find the data you have requested is by scanning a range within an index. Seeks typically happen when a query is "covered" by an index, which means the seek predicates are in the index key and the displayed columns are either in the key or included. A scan happens when the SQL Server Query Optimizer determines that the best way to find the data is to scan the entire index and then filter the results. A lookup typically occurs when an index does not include all requested columns, either in the index key or in the included columns. The query optimizer will then use either the clustered key (against a clustered index) or the RID (against a heap) to "lookup" the other requested columns.
Typically, seek operations are more efficient than scans, due to physically querying a smaller data set. There are situations where this is not the case, such as a very small initial data set, but that goes beyond the scope of your question.
Now, you asked how to determine how effective an index is, and there are a few things to keep in mind. A clustered index's key columns are called a clustering key. This is how records are made unique in the context of a clustered index. All nonclustered indexes will include the clustered key by default, in order to perform lookups when necessary. All indexes will be inserted to, updated to, or deleted from for every respective DML statement. That having been said, it is best to balance performance gains in select statements against performance hits in insert, delete, and update statements.
In order to determine how effective an index is, you must determine the selectivity of your index keys. Selectivity can be defined as a percentage of distinct records to total records. If I have a [person] table with 100 total records and the [first_name] column contains 90 distinct values, we can say that the [first_name] column is 90% selective. The higher the selectivity, the more efficient the index key. Keeping selectivity in mind, it is best to put your most selective columns first in your index key. Using my previous [person] example, what if we had a [last_name] column that was 95% selective? We would want to create an index with [last_name], [first_name] as the index key.
I know this was a bit long-winded answer, but there really are a lot of things that go into determining how effective an index will be, and a lot things you must weigh any performance gains against.
I have recently discovered a fantastic free script from the people at BrentOzar Unltd http://www.brentozar.com/blitzindex/
This does some good analysis of which indexes exist, how often they are used and how often the query engine is looking for an index that doesn't exist.
It's guidance is generally good. Sometimes it gets a bit over-suggestive of ideas. I have generally done the following so far:
- Removed indexes that have NEVER been read (or maybe less than 50 times a month).
- Added the most obvious indexes on foreign keys and fields I know we use a lot.
I haven't added all the recommended indexes, and have gone back a week later to find that they are no longer recommended since the query engine is using some of the other new indexes instead!
Generally you should avoid indexes on:
- Very small tables (less than 50 to 200 records): often the query engine is faster if it scans the table rather than load up the index, read, process it etc.
- Avoid indexes on columns with Low Cardinality (http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)) on the first mentioned column. E.g. Indexing a gender field (M/F) is of very little use, it's just as practical to scan the table and find the ~50% that match. If it's listed after something more specific in the index (e.g. [date of birth, gender]) that's better - you might want all Males born in a given time span.
Clustered Indexes are good - normally these are based on your primary key. They help the database engine put the data on the disk in good order. Very essential to understand this for the largest tables as a good clustered index often reduces the space the table occupies.
I have reduced some tables from 900MB to 400MB, just because they were unstructed heaps beforehand. http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx
Reorganize/Rebuild
You should be looking to check for fragmented indexes. A bit of fragmentation is okay, don't get obsessive! http://technet.microsoft.com/en-us/library/ms189858.aspx Know the difference between reorganize and rebuild!
Review Regularly
Queries change, data volumes change, new features are added, old ones removed. You should be looking at them once a month (or more often if you have high volumes) and looking for where you can help the database out!
How many
In a recent video Brent recommends (typically) no more 5 indexes on a table with lots of writing (e.g. orders table), and no more than 10 if it is read a lot more than written (i.e. logging table for analytics) http://www.youtube.com/watch?v=gOsflkQkHjg
Overall
It depends!
Your mileage varies according to database. Cover the obvious (employee surname, order date etc) on your (now/future) larger tables. Monitor, review, and adjust as necessary. It should be part of your routine checklist when managing your database(s) :)
Hope this helps!