SQL Joins Vs SQL Subqueries (Performance)?
Well, I believe it's an "Old but Gold" question. The answer is: "It depends!". The performances are such a delicate subject that it would be too much silly to say: "Never use subqueries, always join". In the following links, you'll find some basic best practices that I have found to be very helpful:
- Optimizing Subqueries
- Optimizing Subqueries with Semijoin Transformations
- Rewriting Subqueries as Joins
I have a table with 50000 elements, the result i was looking for was 739 elements.
My query at first was this:
SELECT p.id,
p.fixedId,
p.azienda_id,
p.categoria_id,
p.linea,
p.tipo,
p.nome
FROM prodotto p
WHERE p.azienda_id = 2699 AND p.anno = (
SELECT MAX(p2.anno)
FROM prodotto p2
WHERE p2.fixedId = p.fixedId
)
and it took 7.9s to execute.
My query at last is this:
SELECT p.id,
p.fixedId,
p.azienda_id,
p.categoria_id,
p.linea,
p.tipo,
p.nome
FROM prodotto p
WHERE p.azienda_id = 2699 AND (p.fixedId, p.anno) IN
(
SELECT p2.fixedId, MAX(p2.anno)
FROM prodotto p2
WHERE p.azienda_id = p2.azienda_id
GROUP BY p2.fixedId
)
and it took 0.0256s
Good SQL, good.
I would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience IN
is a very slow operator, since SQL normally evaluates it as a series of WHERE
clauses separated by "OR" (WHERE x=Y OR x=Z OR...
).
As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.
The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!