Why does a DELETE query run in one format much longer than in another?

This part of the plan is the problem.

enter image description here

Issue

The correct behaviour if the subquery brings back any NULL is to return 0 rows from the NOT IN.

Even if ID is not nullable (and therefore MIN(ID) cannot possibly be NULL when used as a vector aggregate) the datatype of MIN(ID) is regarded as nullable (it can still return NULL when used as a scalar aggregate against an empty table for example).

So you have this extra row count spool added to the plan whose job it is to ensure (in conjunction with an anti semi join) that no rows are emitted if a NULL is returned by the subquery.

Unfortunately even though the rows eliminated by the anti semi join on this spool are likely to be 0 and all 257,246 rows will flow onto the next operator the cardinality estimation reduces the estimated number of rows going past that step to 1.

As a result it has a scan of the table on the inside of the nested loops with estimated 1 execution whereas in reality it will scan and aggregate the whole table 257,246 times.

The one-row estimate coming out of the Anti Semi join is a known bug that was fixed under trace flag 4199 quite a while ago now. See the related Q & A Workaround for Anti-Semi Join bug for some more background and links.

Solution

The bug only manifests on SQL Server 2017 for you because you have compatibility level 120 selected.

You should find you get a much better estimate for the Anti Semi Join with trace flag 4199 active, an OPTION (QUERYTRACEON 4199) hint, a OPTION (USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')) hint (directly or via plan guide) or for the database:

Click to copy

ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;

The use hint QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140 is another option as of SQL Server 2017 CU10.

Which option you choose depends on how widely you want the optimizer hotfixes to apply. Compatibility levels are intended to be rather short-term things, so you should be planning to move to a more current setting, where this particular optimizer fix is on by default.

Repro

The following script reproduces the problem and a fix:

Click to copy

ALTER DATABASE CURRENT 
SET COMPATIBILITY_LEVEL = 120;
GO
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
GO
DROP TABLE IF EXISTS dbo.tbl;
GO
CREATE TABLE dbo.tbl
(
    Id integer PRIMARY KEY, 
    IdProject integer NOT NULL, 
    IdRepresentative integer NOT NULL, 
    TimeStart datetime NOT NULL,

    INDEX i NONCLUSTERED
    (
        TimeStart, 
        IdRepresentative, 
        IdProject
    )
);
GO
UPDATE STATISTICS dbo.tbl 
WITH 
    ROWCOUNT = 257246, 
    PAGECOUNT = 25725;

Click to copy

DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1
);

Bug

Click to copy

DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1,
    USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')
);

with fix

Alternative Syntax

Ideally you should also rewrite the query to not use the problematic NOT IN. A possible alternative, that is likely to be more efficient even with the above fix, would be

Click to copy

DELETE T
FROM   (SELECT ROW_NUMBER() OVER (PARTITION BY IdProject, IdRepresentative, TimeStart ORDER BY Id) AS RN
        FROM   tbl) T
WHERE  RN > 1

Why does a DELETE query run in one format much longer than in another?

Tags:

Performance

Sql Server

Query Performance

Sql Server 2017

Execution Plan

Cardinality Estimates

Related

Recent Posts