Best practice between using LEFT JOIN or NOT EXISTS
The biggest difference is not in the join vs not exists, it is (as written), the SELECT *
.
On the first example, you get all columns from both A
and B
, whereas in the second example, you get only columns from A
.
In SQL Server, the second variant is slightly faster in a very simple contrived example:
Create two sample tables:
CREATE TABLE dbo.A
(
A_ID INT NOT NULL
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
);
CREATE TABLE dbo.B
(
B_ID INT NOT NULL
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
);
GO
Insert 10,000 rows into each table:
INSERT INTO dbo.A DEFAULT VALUES;
GO 10000
INSERT INTO dbo.B DEFAULT VALUES;
GO 10000
Remove every 5th row from the second table:
DELETE
FROM dbo.B
WHERE B_ID % 5 = 1;
SELECT COUNT(*) -- shows 10,000
FROM dbo.A;
SELECT COUNT(*) -- shows 8,000
FROM dbo.B;
Perform the two test SELECT
statement variants:
SELECT *
FROM dbo.A
LEFT JOIN dbo.B ON A.A_ID = B.B_ID
WHERE B.B_ID IS NULL;
SELECT *
FROM dbo.A
WHERE NOT EXISTS (SELECT 1
FROM dbo.B
WHERE b.B_ID = a.A_ID);
Execution plans:
The second variant does not need to perform the filter operation since it can use the left anti-semi join operator.
Logically they are identical, but NOT EXISTS
is closer to the AntiSemiJoin that you're asking for, and is generally preferred. It also highlights better that you can't access the columns in B, because it's only used as a filter
(as opposed to having them available with NULL values).
Many years ago (SQL Server 6.0 ish), LEFT JOIN
was quicker, but that hasn't been the case for a very long time. These days, NOT EXISTS
is marginally faster.
The biggest impact in Access is that the JOIN
method has to complete the join before filtering it, constructing the joined set in memory. Using NOT EXISTS
it checks for the row but doesn't allocate space for the columns. Plus, it stops looking once it finds a row. Performance varies a bit more in Access, but a general rule of thumb is that NOT EXISTS
tends to be a little faster. I'd be less inclined to say it's "best practice", as there are more factors involved.
An exception I've noticed to the NOT EXISTS
being superior (however marginally) to LEFT JOIN ... WHERE IS NULL
is when using Linked Servers.
From examining the execution plans, it appears that NOT EXISTS
operator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).
Example execution plan demonstrating this behaviour: