Should I index a bit field in SQL Server?
I just came across this question by way of another. Assuming that your statement that only a handful of the records assume the value of 1 (and that those are the ones you're interested in), then a filtered index could be a good choice. Something like:
create index [IX_foobar] on dbo.Foobar (FooID) where yourBitColumn = 1
This will create a substantially smaller index that the optimizer is smart enough to use when that is a predicate in your query.
Consider what an index is in SQL - and index is really a chunk of memory pointing at other chunks of memory (i.e. pointers to rows). The index is broken into pages so that portions of the index can be loaded and unloaded from memory depending on usage.
When you ask for a set of rows, SQL uses the index to find the rows more quickly than table scanning (looking at every row).
SQL has clustered and non-clustered indexes. My understanding of clustered indexes is that they group similar index values into the same page. This way when you ask for all the rows matching an index value, SQL can return those rows from a clustered page of memory. This is why trying to cluster index a GUID column is a bad idea - you don't try to cluster random values.
When you index an integer column, SQL's index contains a set of rows for each index value. If you have a range of 1 to 10, then you would have 10 index pointers. Depending on how many rows there are this can be paged differently. If your query looks for the index matching "1" and then where Name contains "Fred" (assuming the Name column is not indexed), SQL gets the set of rows matching "1" very quickly, then table scans to find the rest.
So what SQL is really doing is trying to reduce the working set (number of rows) it has to iterate over.
When you index a bit field (or some narrow range), you only reduce the working set by the number of rows matching that value. If you have a small number of rows matching it would reduce your working set a lot. For a large number of rows with 50/50 distribution, it might buy you very little performance gain vs. keeping the index up to date.
The reason everyone says to test is because SQL contains a very clever and complex optimizer that may ignore an index if it decides table scanning is faster, or may use a sort, or may organize memory pages however it darn well likes.