Optimizing a compare query
Not sure how such a query should be optimized for Oracle, when no indexes are available but here's another rewriting. It will not calculate the number of mismatching tuples but will stop searching when it finds one:
SELECT 1
FROM dual
WHERE EXISTS
(
Select
X, Y, Z
From
A
MINUS
Select
X, Y, Z
From
B
) ;
SQL-Fiddle shows a similar performance for all the provided answers, as the two tables have to be fully scanned in the worst case.
Rolando's and Narendra's queries show different execution time (sometimes better, sometimes worse), I think because of the HASH JOIN
used.
There may be different results with bigger tables and with fewer or more matches between the two tables. (the test were done with only a few K rows).
As suggested by @Phil, you can try and get Oracle to use a COUNT STOPKEY
operation as follows:
testbed:
create table a as
select level x, level y, level z from dual connect by level<=1000000;
--
create table b as
select level+10 x, level+10 y, level+10 z from dual connect by level<=1000000;
count stopkey
query:
select *
from( select *
from a
where not exists (select * from b where x=a.x and y=b.y and z=b.z))
where rownum<=1;
/*
RESULT
------------
Not the same
*/
plan:
Plan hash value: 322064455
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 15244 (1)| 00:02:48 |
|* 1 | COUNT STOPKEY | | | | | | |
|* 2 | HASH JOIN ANTI | | 776K| 38M| 18M| 15244 (1)| 00:02:48 |
| 3 | TABLE ACCESS FULL| A | 776K| 9855K| | 5756 (1)| 00:01:04 |
|* 4 | TABLE ACCESS FULL| B | 1018K| 37M| | 5770 (1)| 00:01:04 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=1)
2 - access("X"="A"."X")
4 - filter("Y" IS NOT NULL AND "Z" IS NOT NULL)
Note
-----
- dynamic sampling used for this statement (level=2)
If you are very likely to hit a mis-match early, you may want to encourage the CBO to use NESTED LOOPS ANTI
(perhaps you could test this with the NL_AJ
hint). In the worst case this will run very slowly on big tables especially if there is no index to speed up the lookup on b
, but the best case will be very quick.
Your requirement and logic behind the requirement makes sense, in theory. However, how quickly this can be achieved depends upon the data volume in tables A and B as well as availability of any useful indexes. Worst case scenario is if both the tables A and B have large volume of data and there are no useful indexes available on the tables. In that case (and provided table statistics are close to actual data), oracle will not be able to "find first unmatched record" any quicker than getting a count (provided you change your query to the LEFT JOIN approach mentioned above). Best case scenario is either both the tables A and B or table A has small volume of data and/or there are indexes on both tables on the combination of X, Y and Z columns. In this case, following query may perform better than doing a COUNT
SELECT COUNT(*) FROM DUAL
WHERE NOT EXISTS
(
SELECT NULL
FROM A LEFT JOIN B
ON A.X=B.X AND A.Y=B.Y AND A.Z=B.Z
WHERE B.X IS NULL
) ;