Getting a scan though I expect a seek
I don't think the scan is caused by a search for an empty string (and while you could add a filtered index for that case, it will only help very specific variations of the query). You are more likely a victim of parameter sniffing and a single plan not optimized for all of the various combinations of parameters (and parameter values) that you will be providing to this query.
I call this the "kitchen sink" procedure, because you are expecting one query to provide all the things, including the kitchen sink.
I have videos about my solution to this here and here as well as a blog post about it, but essentially, the best experience I have for such queries is to:
- Build the statement dynamically - this will allow you to leave out clauses mentioning columns for which no parameters were supplied, and ensures that you will have a plan that is optimized precisely for the actual parameters that were passed with values.
- Use
OPTION (RECOMPILE)
- this prevents specific parameter values from forcing the wrong type of plan, especially helpful when you have data skew, bad statistics, or when the first execution of a statement uses an atypical value that will lead to a different plan than later and more frequent executions. - Use the server option
optimize for ad hoc workloads
- this prevents query variations that are only used once from polluting your plan cache.
Enable optimize for ad hoc workloads:
EXEC sys.sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sys.sp_configure 'optimize for ad hoc workloads', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sys.sp_configure 'show advanced options', 0;
GO
RECONFIGURE WITH OVERRIDE;
Change your procedure:
ALTER PROCEDURE dbo.Whatever
@Status INT = NULL,
@IsUserGotAnActiveDirectoryUser BIT = NULL
AS
BEGIN
SET NOCOUNT ON;
DECLARE @sql NVARCHAR(MAX) = N'SELECT [IdNumber], [Code], [Status],
[Sex], [FirstName], [LastName], [Profession],
[BirthDate], [HireDate], [ActiveDirectoryUser]
FROM dbo.Employee -- please, ALWAYS schema prefix
WHERE 1 = 1';
IF @Status IS NOT NULL
SET @sql += N' AND ([Status]=@Status)'
IF @IsUserGotAnActiveDirectoryUser = 1
SET @sql += N' AND ActiveDirectoryUser <> ''''';
IF @IsUserGotAnActiveDirectoryUser = 0
SET @sql += N' AND ActiveDirectoryUser = ''''';
SET @sql += N' OPTION (RECOMPILE);';
EXEC sys.sp_executesql @sql, N'@Status INT, @Status;
END
GO
Once you have a workload based on that set of queries that you can monitor, you can analyze the executions and see which ones would most benefit from additional or different indexes - you can do this from a variety of angles, from simple "which combination of parameters are provided most often?" to "which individual queries have the longest runtimes?" We can't answer those questions based just on your code, we can only suggest that any index will only be helpful for a subset of all of the possible parameter combinations you're attempting to support. For example, if @Status
is NULL, then no seek against that non-clustered index is possible. So for those cases where users don't care about status, you're going to get a scan, unless you have an index that caters to the other clauses (but such an index won't be useful either, given your current query logic - either empty string or not empty string is not exactly selective).
In this case, depending on the set of possible Status
values and how distributed those values are, the OPTION (RECOMPILE)
might not be necessary. But if you do have some values that will yield 100 rows and some values that will yield hundreds of thousands, you might want it there (even at the CPU cost, which should be marginal given the complexity of this query), so that you can get seeks in as many cases as possible. If the range of values is finite enough, you could even do something tricky with the dynamic SQL, where you say "I have this very selective value for @Status
, so when that specific value is passed, make this slight alteration to the query text so that this is considered a different query and optimized for that param value."
Disclaimer: Some of the stuff in this answer may make a DBA flinch. I'm approaching it from a pure performance standpoint - how to get Index Seeks when you always get Index Scans.
With that out of the way, here goes.
Your query is what's known as a "kitchen sink query" - a single query meant to cater for a range of possible search conditions. If the user sets @status
to a value, you want to filter on that status. If @status
is NULL
, return all statuses, and so on.
This introduces problems with indexing, but they're not related to sargability, because all your search conditions are "equal to" criteria.
This is sargable:
WHERE [status]=@status
This is not sargable because SQL Server needs to evaluate ISNULL([status], 0)
for every row instead of looking up a single value in the index:
WHERE ISNULL([status], 0)=@status
I've recreated the kitchen sink-problem in a simpler form:
CREATE TABLE #work (
A int NOT NULL,
B int NOT NULL
);
CREATE UNIQUE INDEX #work_ix1 ON #work (A, B);
INSERT INTO #work (A, B)
VALUES (1, 1), (2, 1),
(3, 1), (4, 1),
(5, 2), (6, 2),
(7, 2), (8, 3),
(9, 3), (10, 3);
If you try the following, you'll get an Index Scan, even though A is the first column of the index:
DECLARE @a int=4, @b int=NULL;
SELECT *
FROM #work
WHERE (@a IS NULL OR @a=A) AND
(@b IS NULL OR @b=B);
This, however, produces an Index Seek:
DECLARE @a int=4, @b int=NULL;
SELECT *
FROM #work
WHERE @a=A AND
@b IS NULL;
As long as you're using a manageable amount of parameters (two in your case), you could probably just UNION
a bunch of seek queries - basically all the permutations of search criteria. If you have three criteria, this'll look messy, with four it'll be completely unmanageable. You've been warned.
DECLARE @a int=4, @b int=NULL;
SELECT *
FROM #work
WHERE @a=A AND
@b IS NULL
UNION ALL
SELECT *
FROM #work
WHERE @a=A AND
@b=B
UNION ALL
SELECT *
FROM #work
WHERE @a IS NULL AND
@b=B
UNION ALL
SELECT *
FROM #work
WHERE @a IS NULL AND
@b IS NULL;
For the third one of those four to use an Index Seek, you're going to need a second index on (B, A)
, though. Here's how your query might look with these changes (including my refactoring of the query to make it more readable).
DECLARE @Status int = NULL,
@IsUserGotAnActiveDirectoryUser bit = NULL;
SELECT [IdNumber], [Code], [Status], [Sex], [FirstName], [LastName],
[Profession], [BirthDate], [HireDate], [ActiveDirectoryUser]
FROM Employee
WHERE [Status]=@Status AND
@IsUserGotAnActiveDirectoryUser IS NULL
UNION ALL
SELECT [IdNumber], [Code], [Status], [Sex], [FirstName], [LastName],
[Profession], [BirthDate], [HireDate], [ActiveDirectoryUser]
FROM Employee
WHERE [Status]=@Status AND
@IsUserGotAnActiveDirectoryUser=1 AND ActiveDirectoryUser<>''
UNION ALL
SELECT [IdNumber], [Code], [Status], [Sex], [FirstName], [LastName],
[Profession], [BirthDate], [HireDate], [ActiveDirectoryUser]
FROM Employee
WHERE [Status]=@Status AND
@IsUserGotAnActiveDirectoryUser=0 AND (ActiveDirectoryUser IS NULL OR ActiveDirectoryUser='')
UNION ALL
SELECT [IdNumber], [Code], [Status], [Sex], [FirstName], [LastName],
[Profession], [BirthDate], [HireDate], [ActiveDirectoryUser]
FROM Employee
WHERE @Status IS NULL AND
@IsUserGotAnActiveDirectoryUser IS NULL
UNION ALL
SELECT [IdNumber], [Code], [Status], [Sex], [FirstName], [LastName],
[Profession], [BirthDate], [HireDate], [ActiveDirectoryUser]
FROM Employee
WHERE @Status IS NULL AND
@IsUserGotAnActiveDirectoryUser=1 AND ActiveDirectoryUser<>''
UNION ALL
SELECT [IdNumber], [Code], [Status], [Sex], [FirstName], [LastName],
[Profession], [BirthDate], [HireDate], [ActiveDirectoryUser]
FROM Employee
WHERE @Status IS NULL AND
@IsUserGotAnActiveDirectoryUser=0 AND (ActiveDirectoryUser IS NULL OR ActiveDirectoryUser='');
... plus you'll need an additional index on Employee
with the two index columns reversed.
For completeness, I should mention that x=@x
implicitly means that x
cannot be NULL
because NULL
is never equal to NULL
. That simplifies the query a bit.
And, yes, Aaron Bertrand's dynamic SQL answer is a better choice in most cases (i.e. whenever you can live with the recompiles).
Your basic question seems to be "Why" and I think you might find the answer about minute 55 or so of this Great presentation by Adam Machanic at TechEd a few years ago.
I mention the 5 minutes at minute 55 but the whole presentation is worth the time. If you look at the query plan for your query I am sure you will find it has Residual Predicates for the search. Basically SQL can't "see" all of the parts of the index because some of them are hidden by the inequalities and other conditions. The result is an index scan for a super set based on the Predicate. That result is spooled and then re-scanned using the residual predicate.
Check the properties of the Scan Operator (F4) and see if you have both "Seek Predicate" and "Predicate" in the property list.
As others have indicated, the query is difficult to index as is. I have been working on many similar ones recently and each has required a different solution. :(