Subquery gives no error for a non-existing column with the same name as in the outer query
I run this query:
select * from t1 where c1 in (select c1 from t2);
The above query should give an error as
c1
is not present in t2. Instead, it returns all the rows fromt1
.
No, the query should not give an error. It's a common mistake (thinking that the c1
in (select c1 from t2)
refers to t2
. It doesn't due to scope resolution, i.e. how column names are resolved (how it is found which table they are referring to). The query:
select * from t1 where c1 in (select c1 from t2);
Can resolve to three different options:
when
t2
has a column namedc1
, it runs as:select * from t1 where c1 in (select t2.c1 from t2);
when
t2
does not have a column namedc1
, butt1
has, it runs as:
(this is your case!)select * from t1 where c1 in (select t1.c1 from t2);
and when neither
t2
nort1
have a column namedc1
, it will throw an error:select * from t1 where c1 in (select c1 from t2);
-- Error is thrown ("Unknown column c1" or something like that)
Another version of the above query with
delete
which can be much more disastrous:delete from t1 where c1 in (select c1 from t2);
The above query deletes all the rows from t1 when it is just supposed to give an error.
For the same reasons, no. The query resolves to and runs as:
delete from t1 where c1 in (select t1.c1 from t2);
so it will delete all rows from t1
as long as t2
table is not empty.
How to avoid these problems?
Always prefix column references with their table names. By doing this, you will always have the result you want or get an error if the column doesn't appear in the table you are prefixing it with.
Your queries should be:
select t1.* from t1 where t1.c1 in (select t2.c1 from t2);
delete from t1 where t1.c1 in (select t2.c1 from t2);
Both of them will throw an error if there is no column c1
in table t2
.
It is actualy not an error. You are really referencing the column from outer table.
Because of this "feature" you are able to write correlated subqueries and reference the fields from outer query in the inner query.
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 where c1=c2)
Usually this is used in inner query WHERE clause but there is no reason why it wouldn't be used in the SELECT part. For example you might be wanting to concatenate or add field from outer query with field from inner to get the result.
SELECT * FROM t1 WHERE total IN (SELECT c1+c2 FROM t2 WHERE c2>c1)
The above is confusing so the best way to prevent errors as you describe is to prefix fields with table names to all the fields.
SELECT t1.c1 FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2)
Now if you mistakenly put t2.c1 in subquery you will get an error.
The other queries will also be more understendable:
SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 where t1.c1=t2.c2)
SELECT * FROM t1 WHERE t1.total IN (SELECT t1.c1+t2.c2 FROM t2 WHERE t2.c2>t1.c1)
It is good to build this habit in writing the queries early because in more complex databases there are always fields with the same or similar names (primary keys are usually always "id" etc..) that could lead to serious errors. It is not only problem of wrong deletes but I've seen important reports giving wrong figures for years.