Why use the INCLUDE clause when creating an index?

You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.

Imagine you need to query for an employee's ID, department ID, and lastname.

SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5

If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.

If you had included that lastname in your index:

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(EmployeeID, DepartmentID)
  INCLUDE (Lastname)

then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.

Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.


If the column is not in the WHERE/JOIN/GROUP BY/ORDER BY, but only in the column list in the SELECT clause is where you use INCLUDE.

The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree

INCLUDE columns are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc as I mentioned above. However, it may be useful if you have a residual lookup in a few rows from the key column(s)

Another MSDN article with a worked example