SQL JOIN - WHERE clause vs. ON clause
They are not the same thing.
Consider these queries:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345
and
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
AND Orders.ID = 12345
The first will return an order and its lines, if any, for order number 12345
. The second will return all orders, but only order 12345
will have any lines associated with it.
With an INNER JOIN
, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.
Does not matter for inner joins
Matters for outer joins
a.
WHERE
clause: After joining. Records will be filtered after join has taken place.b.
ON
clause - Before joining. Records (from right table) will be filtered before joining. This may end up as null in the result (since OUTER join).
Example: Consider the below tables:
1. documents:
| id | name |
--------|-------------|
| 1 | Document1 |
| 2 | Document2 |
| 3 | Document3 |
| 4 | Document4 |
| 5 | Document5 |
2. downloads:
| id | document_id | username |
|------|---------------|----------|
| 1 | 1 | sandeep |
| 2 | 1 | simi |
| 3 | 2 | sandeep |
| 4 | 2 | reya |
| 5 | 3 | simi |
a) Inside WHERE
clause:
SELECT documents.name, downloads.id
FROM documents
LEFT OUTER JOIN downloads
ON documents.id = downloads.document_id
WHERE username = 'sandeep'
For above query the intermediate join table will look like this.
| id(from documents) | name | id (from downloads) | document_id | username |
|--------------------|--------------|---------------------|-------------|----------|
| 1 | Document1 | 1 | 1 | sandeep |
| 1 | Document1 | 2 | 1 | simi |
| 2 | Document2 | 3 | 2 | sandeep |
| 2 | Document2 | 4 | 2 | reya |
| 3 | Document3 | 5 | 3 | simi |
| 4 | Document4 | NULL | NULL | NULL |
| 5 | Document5 | NULL | NULL | NULL |
After applying the `WHERE` clause and selecting the listed attributes, the result will be:
| name | id |
|--------------|----|
| Document1 | 1 |
| Document2 | 3 |
b) Inside JOIN
clause
SELECT documents.name, downloads.id
FROM documents
LEFT OUTER JOIN downloads
ON documents.id = downloads.document_id
AND username = 'sandeep'
For above query the intermediate join table will look like this.
| id(from documents) | name | id (from downloads) | document_id | username |
|--------------------|--------------|---------------------|-------------|----------|
| 1 | Document1 | 1 | 1 | sandeep |
| 2 | Document2 | 3 | 2 | sandeep |
| 3 | Document3 | NULL | NULL | NULL |
| 4 | Document4 | NULL | NULL | NULL |
| 5 | Document5 | NULL | NULL | NULL |
Notice how the rows in `documents` that did not match both the conditions are populated with `NULL` values.
After Selecting the listed attributes, the result will be:
| name | id |
|------------|------|
| Document1 | 1 |
| Document2 | 3 |
| Document3 | NULL |
| Document4 | NULL |
| Document5 | NULL |
On INNER JOIN
s they are interchangeable, and the optimizer will rearrange them at will.
On OUTER JOIN
s, they are not necessarily interchangeable, depending on which side of the join they depend on.
I put them in either place depending on the readability.