How can I perform a SQL 'NOT IN' query faster?
You can use a left outer join, or a not exists
clause.
Left outer join:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in
one.
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress)
, that should help speed up the execution of your query.
This was just a really long-winded way of telling you that, for the pattern of finding all rows in table A where some condition does not exist in table B, NOT EXISTS is typically going to be your best choice. But, as always, you need to test these patterns in your own environment, using your schema, data and hardware, and mixed in with your own workloads.
More Details https://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.
A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.