JOIN or Correlated subquery with exists clause, which one is better
Generally, the EXISTS clause because you may need DISTINCT for a JOIN for it to give the expected output. For example, if you have multiple Department
rows for a ContactInformation
row.
In your example above, the SELECT *
:
- means different output too so they are not actually equivalent
- less chance of a index being used because you are pulling all columns out
Saying that, even with a limited column list, they will give the same plan: until you need DISTINCT... which is why I say "EXISTS"
You need to measure and compare - there's no golden rule which one will be better - it depends on too many variables and things in your system.
In SQL Server Management Studio, you could put both queries in a window, choose Include actual execution plan
from the Query menu, and then run them together.
You should get a comparison of both their execution plans and a percentage of how much of the time was spent on one or the other query. Most likely, both will be close to 50% in this case. If not - then you know which of the two queries performs better.
You can learn more about SQL Server execution plans (and even download a free e-book) from Simple-Talk - highly recommended.
I assume that either you meant to add the DISTINCT
keyword to the SELECT
clause in your second query (or, less likely, a Department has only one Contact).
First, always start with 'logical' considerations. The EXISTS
construct is arguably more intuitive so, all things 'physical' being equal, I'd go with that.
Second, there will be one day when you will need to ports this code, not necessarily to a different SQL product but, say, the same product but with a different optimizer. A decent optimizer should recognise that both are equivalent and come up with the same ideal plan. Consider that, in theory, the EXISTS
construct has slightly more potential to short circuit.
Third, test it using a reasonably large data set. If performance isn't acceptable, start looking at the 'physical' considerations (but I suggest you always keep your 'logically-pure' code in comments for the forthcoming day when the perfect optimizer arrives :)