Unusual column comparisons and query performance
I don't like ISNULL
with sentinel values, it requires picking values that can't ever legitimately appear in the data now or forever after and personally I find expressions containing these more difficult to reason about.
For your test rig I tried four different ways of expressing the query and got the stated results.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE r.hash <> l.hash
OR ( r.hash IS NULL
AND l.hash IS NOT NULL )
OR ( l.hash IS NULL
AND r.hash IS NOT NULL )
SQL Server Execution Times: CPU time = 30968 ms, elapsed time = 8230 ms.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE ISNULL(r.hash, 0) <> ISNULL(l.hash, 0)
SQL Server Execution Times: CPU time = 31594 ms, elapsed time = 9230 ms.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE EXISTS(SELECT r.hash
EXCEPT
SELECT l.HASH)
SQL Server Execution Times: CPU time = 46531 ms, elapsed time = 13191 ms.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE NOT EXISTS(SELECT r.hash
INTERSECT
SELECT l.HASH)
SQL Server Execution Times: CPU time = 23812 ms, elapsed time = 6760 ms.
So on that basis the last one would be a clear winner - along with a code comment with a link to Undocumented Query Plans: Equality Comparisons for anyone unfamiliar with the pattern.
But you should test whether this pattern is reproducible with your actual MERGE
query too.
I would agree with your findings. The EXCEPT
pattern that your consultants wrote handles NULL values, and is also a really nice way to manage a very large number of comparison columns.
But if the column cannot be NULL and performance suffers, I suppose there's no point in keeping it.
Your test could be inaccurate in the following regards:
- The original queries a view, while your test queries a heap.
- The original query may have a considerably different data volume, whereas you're trying 100 000 rows. Plans are generated differently, depending on the estimated number of rows they will result in.
- Is the original column really varbinary(8000)? That sounds awfully wide for a hash value, and column widths affect memory grants, and ultimately performance.
This pattern may be a general solution for column comparisons, generated by a BIML package or similar ETL tool.