Why are [Seemingly] suitable indexes not used on a LEFT JOIN with OR
Rather than focusing on how to improve a query like this, which is what the other answers are doing, I'm going to try to answer the question being asked: why doesn't the optimizer produce a plan like the one you've described (that scans the Users table, and then seeks into the two indexes on the Comments table).
Here's your original query again (note I'm using MAXDOP 2
just to simulate what I saw in your execution plans):
SELECT *
FROM Users u
LEFT JOIN Comments c
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);
And the plan:
- Scan of
dbo.Users
with residual predicate to get just the "alex" users - For each of those users, scan the
dbo.Comments
table and filter matches in the join operator - Estimated cost: 293.161 optimizer units
One attempt to get the plan you want would be to try and force a seek on the dbo.Comments
table:
SELECT *
FROM Users u
LEFT JOIN Comments c WITH (FORCESEEK)
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);
The plans looks like this:
- scan of the
dbo.Users
table (with a residual predicate to only get users named "alex"), - seek into each of the two indexes to get the requested Id values (which are unioned together)
- followed up by a key lookup to get the rest of the columns (since we selected *)
- Estimated cost: 5.98731 optimizer units
So the answer is that the optimizer is definitely capable of producing such a plan. And it doesn't seem to be a cost-based decision (the seek plan looks much cheaper).
My best guess is that this is just some kind of limitation in the optimizer's exploration process - it doesn't seem to favor converting a left join with an or clause into an apply. This is really unfortunate in this particular case, as performance is dismal in the scan plan (the query takes 45 seconds on my machine) vs the apply plan (less than 1 second).
Side note: You can override the heuristic that disfavors index union plans with undocumented trace flag 8726. See https://dba.stackexchange.com/a/23779 for additional details on that front!
As Rob Farley helpfully pointed out, using APPLY
directly (potentially with a UNION
as well) is a better approach to get the plan you're looking for - both of those produce the "better" version of this plan (the FORCESEEK
version). I would say that "OR
in a JOIN
" is kind of a known anti-pattern, and should be avoided since it doesn't seem like the optimizer has great support for that type of query directly.
When you have a join, the Query Optimizer will consider how best to satisfy the predicates involved with the various join techniques. It doesn’t try to re-evaluate the query as if had been written with APPLY, which is what you kinda want here, where it would see the right hand side of the join as like a sub-query.
You can try this youself, by doing something like:
SELECT *
FROM Users u
OUTER APPLY (
SELECT *
FROM Comments c
WHERE u.Id = c.UserId
OR u.Id = c.PostId
) c
WHERE u.DisplayName = 'alex'
...but given than OR is annoyingly often not turned into a UNION, I’d prefer:
SELECT *
FROM Users u
OUTER APPLY (
SELECT *
FROM Comments c
WHERE u.Id = c.UserId
UNION
SELECT *
FROM Comments c
WHERE u.Id = c.PostId
) c
WHERE u.DisplayName = 'alex'