EXISTS (SELECT 1 ...) vs EXISTS (SELECT * ...) One or the other?
No, there is no difference in efficiency between (NOT) EXISTS (SELECT 1 ...)
and (NOT) EXISTS (SELECT * ...)
in all major DBMS. I've often seen (NOT) EXISTS (SELECT NULL ...)
being used as well.
In some you can even write (NOT) EXISTS (SELECT 1/0 ...)
and the result is the same - without any (division by zero) error, which proves that the expression there is not even evaluated.
About the LEFT JOIN / IS NULL
antijoin method, a correction: this is equivalent to NOT EXISTS (SELECT ...)
.
In this case, NOT EXISTS
vs LEFT JOIN / IS NULL
, you may get different execution plans. In MySQL for example and mostly in older versions (before 5.7) the plans would be fairly similar but not identical. The optimizers of other DBMS (SQL Server, Oracle, Postgres, DB2) are - as far as I know - more or less capable of rewriting these 2 methods and considering the same plans for both. Still, there is no such guarantee and when doing optimization, it is good to check the plans from different equivalent rewrites as there could be cases that each optimizer doesn't rewrite (eg. complex queries, with many joins and/or derived tables / subqueries inside the subquery, where conditions from multiple tables, composite columns used in the joining conditions) or the optimizer choices and plans are affected differently by the available indexes, settings, etc.
Also note that USING
cannot be used in all DBMS (SQL Server for example). The more common JOIN ... ON
works everywhere.
And the columns needs to be prefixed with the table name/alias in the SELECT
to avoid errors/ambiguities when we have joins.
I also usually prefer to put the joined column in the IS NULL
check (although the PK or any non-nullable column would be OK, it might be useful for efficiency when the plan for LEFT JOIN
uses a non-clustered index):
SELECT a_table.a, a_table.b, a_table.c
FROM a_table
LEFT JOIN another_table
ON another_table.b = a_table.b
WHERE another_table.b IS NULL ;
There is also a third method for antijoins, using NOT IN
but this has different semantics (and results!) if the column of the inside table is nullable. It can be used though by excluding the rows with NULL
, making the query equivalent to the previous 2 versions:
SELECT a, b, c
FROM a_table
WHERE a_table.b NOT IN
(SELECT another_table.b
FROM another_table
WHERE another_table.b IS NOT NULL
) ;
This also usually yields similar plans in most DBMS.
There is one category of cases where SELECT 1
and SELECT *
are not interchangeable – more specifically, one will always be accepted in those cases while the other mostly will not.
I am talking about cases where you need to check for existence of rows of a grouped set. If table T
has columns C1
and C2
and you are checking for existence of row groups that match a specific condition, you can use SELECT 1
like this:
EXISTS
(
SELECT
1
FROM
T
GROUP BY
C1
HAVING
AGG(C2) = SomeValue
)
but you cannot use SELECT *
in the same way.
That is merely a syntactic aspect. Where both options are accepted syntactically, you will most likely have no difference in terms of performance or the results returned, as has been explained in the other answer.
Additional notes following comments
It appears not many database products actually support this distinction. Products like SQL Server, Oracle, MySQL and SQLite will happily accept SELECT *
in the above query without any errors, which probably means they treat an EXISTS SELECT
in a special way.
PostgreSQL is one RDBMS where SELECT *
may fail, but may still work in some cases. In particular, if you are grouping by the PK, SELECT *
will work fine, otherwise it will fail with the message:
ERROR: column "T.C2" must appear in the GROUP BY clause or be used in an aggregate function
The "proof" that they are identical (in MySQL) is to do
EXPLAIN EXTENDED
SELECT EXISTS ( SELECT * ... ) AS x;
SHOW WARNINGS;
then repeat with SELECT 1
. In both cases, the 'extended' output shows that it was transformed into SELECT 1
.
Similarly, COUNT(*)
is turned into COUNT(0)
.
Another thing to note: Optimization improvements have been made in recent versions. It may be worth comparing EXISTS
vs anti-joins. Your version may do a better job with one versus the other.