Unusual column comparisons and query performance

I don't like ISNULL with sentinel values, it requires picking values that can't ever legitimately appear in the data now or forever after and personally I find expressions containing these more difficult to reason about.

For your test rig I tried four different ways of expressing the query and got the stated results.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  r.hash <> l.hash
        OR ( r.hash IS NULL
             AND l.hash IS NOT NULL )
        OR ( l.hash IS NULL
             AND r.hash IS NOT NULL )

SQL Server Execution Times: CPU time = 30968 ms, elapsed time = 8230 ms.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  ISNULL(r.hash, 0) <> ISNULL(l.hash, 0)

SQL Server Execution Times: CPU time = 31594 ms, elapsed time = 9230 ms.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  EXISTS(SELECT r.hash
              EXCEPT
              SELECT l.HASH)

SQL Server Execution Times: CPU time = 46531 ms, elapsed time = 13191 ms.

SELECT COUNT(1)
FROM   dbo.l
       CROSS JOIN dbo.r
WHERE  NOT EXISTS(SELECT r.hash
                  INTERSECT
                  SELECT l.HASH)

SQL Server Execution Times: CPU time = 23812 ms, elapsed time = 6760 ms.

So on that basis the last one would be a clear winner - along with a code comment with a link to Undocumented Query Plans: Equality Comparisons for anyone unfamiliar with the pattern.

But you should test whether this pattern is reproducible with your actual MERGE query too.

I would agree with your findings. The EXCEPT pattern that your consultants wrote handles NULL values, and is also a really nice way to manage a very large number of comparison columns.

But if the column cannot be NULL and performance suffers, I suppose there's no point in keeping it.

Your test could be inaccurate in the following regards:

The original queries a view, while your test queries a heap.
The original query may have a considerably different data volume, whereas you're trying 100 000 rows. Plans are generated differently, depending on the estimated number of rows they will result in.
Is the original column really varbinary(8000)? That sounds awfully wide for a hash value, and column widths affect memory grants, and ultimately performance.

This pattern may be a general solution for column comparisons, generated by a BIML package or similar ETL tool.

Unusual column comparisons and query performance

Tags:

Performance

Sql Server

Sql Server 2016

Query Performance

Except

Related

Recent Posts