Reducing Key Lookups
Background
In the worst case, a query containing a lookup has to go to physical storage for rows that require column data not covered by the nonclustered index. In the very worst of worst cases, each lookup will require a separate I/O, and execution will have to wait for that single row's worth of data to come back before proceeding. This scenario usually has severe performance implications if the lookup has to process a significant number of rows.
That is why lookups get such a bad press. On the other hand, consider that the ability to do lookups was introduced in SQL Server 2000. In SQL Server 7.0 the query processor could only use a nonclustered index if it contained all the information necessary to satisfy the query; in all other cases, it had to access data via a clustered index (if present, or a heap scan otherwise). If lookups were always so very bad, SQL Server would surely never have introduced them.
In SQL Server 2000+ then, where we have a nonclustered index that provides useful ordering and/or (most of) the columns required by a query, and where the number of lookups is likely to be relatively small, using the nonclustered index and performing a limited number of lookups on the base table is likely to be the cheapest available access method (though a fully-covering nonclustered index might be cheaper still, of course).
In many cases, it is just not practical to create as many nonclustered indexes as would be needed to avoid scanning the base table for all common queries. One reason might be that INSERT/UPDATE/DELETE/MERGE
performance is more important that querying speed (remember that data modification operations also have to maintain all affected nonclustered indexes). Another reason might be space; each nonclustered index represents a copy of a subset of the columns of the base table (or expressions thereon) just sorted differently. More copies of the data means more storage space, and more things competing for space in SQL Server's in-memory data cache.
Other times, we can create just a few extra indexes (perhaps filtered in SQL Server 2008+) with just enough INCLUDE
columns to satisfy the vast majority of performance-critical queries, without compromising data modification performance too much, and without using too much extra disk space. Balancing the competing considerations is what makes index tuning more art than science.
Cost
You ask what the 99% cost for the lookup operator really means in the query plan. The costing component of the query optimizer produces an estimated cost for that operation that is 99% of the total estimated for the query. The number itself (0.29) does not mean very much at all; for all practical purposes, you should consider it as a unit-less number used internally by the optimizer when comparing alternative strategies for that specific query.
The estimated cost takes no account of your hardware, configuration, application needs, or very much of anything else. The cost model used by the optimizer includes a significant number of heuristics and simplifying assumptions that happen to produce reasonable plans most of the time, for most queries, on most hardware. That's not to say that there is no correlation between high-cost operators in plans and performance; rather the link is often much weaker than commonly expected. By all means check the reasons for high-estimated-cost plan operators first, but don't treat the information as anything other than a very possibly flawed estimate.
Impact
I also want to mention a couple of factors that can ameliorate the impact of lookups. First, I mentioned right at the start that the worst case involves row-by-row physical I/O. This will obviously be avoided if the data pages (clustered index or heap) needed to satisfy the lookups are already in memory (data cache). Where this is the case, the execution time difference between a plan with a lookup versus a covering index may well be immeasurable. Even where physical I/O is required, if the number of reads is small, you still may not care. (How likely data pages for a table are to be in the data cache depends on many factors, and will be specific to your hardware and circumstances).
Where more that a little physical I/O is needed, the impact of the lookups may still be reduced by optimizations present in the query plan. If SQL Server expects the number of lookups to be significant, it may choose to explicitly sort the rows entering the nested loops join driving the lookup in order of the non-clustered keys. This reordering promotes sequential reading of the nonclustered index, which may be very much faster than random I/O on your hardware.
With or without an explicit sort, the nested loops join driving the lookup may have the WithOrderedPrefetch
or WithUnorderedPrefetch
attributes present. In either case, the query execution engine 'looks ahead' in the index key stream driving the lookups and issues read-ahead reads. The idea is to issue asynchronous read requests to the I/O system for data pages that will be needed soon, so that by the time the lookup needs a data page, it is already present in memory.
Under ideal conditions (low fragmentation, good query plan, high-performance I/O system) the read-ahead mechanism may well be fast enough to prevent even large parallel query plans from ever waiting on I/O to complete. This is especially true in Enterprise Edition, which can issue very large single I/O requests (up to 2MB per request if memory serves). On the other hand, under less than ideal (more normal!) conditions, your query may suffer horribly as it waits on long I/O queues, or fails to drive the I/O system hard enough. The worst case performance of key lookups can be very poor indeed.
Summary
In summary, you will generally want to avoid lookups where it makes sense to do so. For small queries (that are going to remain small) you may decide that the overhead of extra indexes (space & maintenance) is not justified, given due weight to the wider needs of the system and its users.
Ultimately this is all part of the art & science that is database development and administration.
Imagine the phone company has a list of phone numbers, including who the customer is, where they live, what their billing number is, and so on. The Primary Key could be the phone number.
They give you the White Pages. That's like a non-clustered index, which ordered by Name, including columns like the address.
If you want to find all the Farleys in the book, and are interested in their addresses, then the white pages is all you need. You can quickly Seek to the Farleys (finding the Fs, and so on), and then you have all the information you need.
But if you want their billing numbers, then you need to do a lookup. You can quickly find all the phone numbers of the Farleys, but then you need to take each of them (hundreds) and do another Seek (lookup) in the main (clustered) index, the one that's ordered by the phone number. Each of those is roughly the same cost as the seek to find the Farleys, making your query run orders of magnitude worse.
And there's a threshold. At some point, the database will realise that it's quicker just to go through every page of the clustered index, checking every record to see if it's of interest.
Seriously - get rid of Lookups. Your queries might be fast now, but probably won't scale.