In MySQL queries, why use join instead of where?
Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:
- List the tables involved in a comma separated list in the FROM clause
Write the association between the tables in the WHERE clause
SELECT * FROM TABLE_A a, TABLE_B b WHERE a.id = b.id
Here's the query re-written using ANSI-92 JOIN syntax:
SELECT *
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
From a Performance Perspective:
Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:
- Ability to control JOIN order - the order which tables are scanned
- Ability to apply filter criteria on a table prior to joining
From a Maintenance Perspective:
There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:
- More readable, as the JOIN criteria is separate from the WHERE clause
- Less likely to miss JOIN criteria
- Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
- WHERE clause only serves as filtration of the cartesian product of the tables joined
From a Design Perspective:
ANSI-92 JOIN syntax is pattern, not anti-pattern:
- The purpose of the query is more obvious; the columns used by the application is clear
- It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.
Conclusion
Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.
These are the problems with using the where syntax (other wise known as the implicit join):
First, it is all too easy to get accidental cross joins because the join conditions are not right next to the table names. If you have 6 tables being joined together, it is easy to miss one in the where clause. You will see this fixed all too often by using the distinct keyword. This is ahuge performance hit for the database. You can't get an accidental cross join using the explicit join syntax as it will fail the syntax check.
Right and left joins are problematic (In SQl server you are not guaranteed to get the correct results) in the old syntax in some databases. Further they are deprecated in SQL Server I know.
If you intend to use a cross join, that is not clear from the old syntax. It is clear using the current ANSII standard.
It is much harder for the maintainer to see exactly which fields are part of the join or even which tables join together in what order using the implicit syntax. This means it might take more time to revise the queries. I have known very few people who, once they took the time to feel comfortable with the explicit join syntax, ever went back to the old way.
I've also noticed that some people who use these implicit joins don't actually understand how joins work and thus are getting incorrect results in their queries.
Honestly, would you use any other kind of code that was replaced with a better method 18 years ago?