Performance tuning on a query

Let's start by considering join order. You have three table references in the query. Which join order might give you the best performance? The query optimizer thinks that the join from DsJobStat to DsAvg will eliminate almost all of the rows (cardinality estimates fall from 212195000 to 1 row). The actual plan shows us that the estimate is pretty close to reality (11 rows survive the join). However, the join is implemented as a right anti semi merge join, so all 212 million rows from the DsJobStat table are scanned just to produce 11 rows. That could certainly be contributing to the long query execution time, but I can't think of a better physical or logical operator for that join which would have been better. I'm sure that the DJS_Dashboard_2 index is used for other queries, but all of the extra key and included columns will just require more IO for this query and slow you down. So you potentially have a table access problem with the index scan on the DsJobStat table.

I'm going to assume that the join to AJF isn't very selective. It currently isn't relevant to the performance issues that you're seeing in the query, so I'm going to ignore it for the rest of this answer. That could change if the data in the table changes.

The other problem that's apparent from the plan is the row count spool operator. This is a very lightweight operator but it's executing over 200 million times. The operator is there because the query is written with NOT IN. If there is a single NULL row in DsAvg then all rows must be eliminated. The spool is the implementation of that check. That probably isn't the logic that you want, so you'd be better off with writing that part to use NOT EXISTS. The actual benefit of that rewrite will depend on your system and data.

I mocked up some data based on the query plan to test a few query rewrites. My table definitions are significantly different from yours because it would have been too much effort to mock up data for every single column. Even with the abbreviated data structures I was able to reproduce the performance issue that you're experiencing.

CREATE TABLE [dbo].[DsAvg](
    [JobName] [nvarchar](255) NULL
);

CREATE CLUSTERED INDEX CI_DsAvg ON [DsAvg] (JobName);

INSERT INTO [DsAvg] WITH (TABLOCK)
SELECT TOP (200000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);

CREATE TABLE [dbo].[DsJobStat](
    [JobName] [nvarchar](255) NOT NULL,
    [JobStatus] [nvarchar](255) NULL,
);

CREATE CLUSTERED INDEX CI_JobStat ON DsJobStat (JobName)

INSERT INTO [DsJobStat] WITH (TABLOCK)
SELECT [JobName], 'ACTIVE'
FROM [DsAvg] ds
CROSS JOIN (
SELECT TOP (1000) 1
FROM master..spt_values t1
) c (t);

INSERT INTO [DsJobStat] WITH (TABLOCK)
SELECT TOP (1000) '200001', 'ACTIVE'
FROM master..spt_values t1;

Based on the query plan, we can see that there are around 200000 unique JobName values in the DsAvg table. Based on the actual number of rows after the join to that table we can see that almost all of the JobName values in DsJobStat are also in the DsAvg table. Thus, the DsJobStat table has 200001 unique values for the JobName column and 1000 rows per value.

I believe that this query represents the performance issue:

SELECT DsJobStat.JobName AS JobName, DsJobStat.JobStatus AS JobStatus
FROM DsJobStat
WHERE DsJobStat.JobName NOT IN( SELECT [DsAvg].JobName FROM [DsAvg] );

All of the other stuff in your query plan (GROUP BY, HAVING, ancient style join, etc) happens after the result set has been reduced to 11 rows. It currently doesn't matter from a query performance point of view, but there could be other concerns there which could be revealed by changed data in your tables.

I'm testing in SQL Server 2017, but I get the same basic plan shape as you:

before plan

On my machine, that query takes 62219 ms of CPU time and 65576 ms of elapsed time to execute. If I rewrite the query to use NOT EXISTS:

SELECT DsJobStat.JobName AS JobName, DsJobStat.JobStatus AS JobStatus
FROM DsJobStat
WHERE NOT EXISTS (SELECT 1 FROM [DsAvg] WHERE DsJobStat.JobName = [DsAvg].JobName);

no spool

The spool is no longer executed 212 million times and it probably has the intended behavior from the vendor. Now the query executes in 34516 ms of CPU time and 41132 ms of elapsed time. The majority of the time is spent scanning 212 million rows from the index.

That index scan is very unfortunate for that query. On average we have 1000 rows per unique value of JobName, but we know after reading the first row if we'll need the preceding 1000 rows. We almost never need those rows, but we still need to scan them anyway. If we know that the rows aren't very dense in the table and that almost all of them will be eliminated by the join we can imagine a possibly more efficient IO pattern on the index. What if SQL Server read the first row per unique value of JobName, checked if that value was in DsAvg, and simply skipped ahead to the next value of JobName if it was? Instead of scanning 212 million rows a seek plan requiring around 200k executions could be done instead.

This can mostly be accomplished by using recursion along with a technique that Paul White pioneered that's described here. We can use recursion to do the IO pattern that I described above:

WITH RecursiveCTE
AS
(
    -- Anchor
    SELECT TOP (1)
        [JobName]
    FROM dbo.DsJobStat AS T
    ORDER BY
        T.[JobName]

    UNION ALL

    -- Recursive
    SELECT R.[JobName]
    FROM
    (
        -- Number the rows
        SELECT 
            T.[JobName],
            rn = ROW_NUMBER() OVER (
                ORDER BY T.[JobName])
        FROM dbo.DsJobStat AS T
        JOIN RecursiveCTE AS R
            ON R.[JobName] < T.[JobName]
    ) AS R
    WHERE
        -- Only the row that sorts lowest
        R.rn = 1
)
SELECT js.*
FROM RecursiveCTE
INNER JOIN dbo.DsJobStat js ON RecursiveCTE.[JobName]= js.[JobName]
WHERE NOT EXISTS (SELECT 1 FROM [DsAvg] WHERE RecursiveCTE.JobName = [DsAvg].JobName)
OPTION (MAXRECURSION 0);

That query is a lot to look at so I recommend carefully examining the actual plan. First we do 200002 index seeks against the index on DsJobStat to get all of the unique JobName values. Then we join to DsAvgand eliminate all rows but one. For the remaining row, join back to DsJobStat and get all of the required columns.

The IO pattern totally changes. Before we got this:

Table 'DsJobStat'. Scan count 1, logical reads 1091651, physical reads 13836, read-ahead reads 181966

With the recursive query we get this:

Table 'DsJobStat'. Scan count 200003, logical reads 1398000, physical reads 1, read-ahead reads 7345

On my machine, the new query executes in just 6891 ms of CPU time and 7107 ms of elapsed time. Note that needing to use recursion in this way suggests that something is missing from the data model (or maybe it was just unstated in the posted question). If there is a relatively small table that contains all possible JobNames it will be much better to use that table as opposed to recursion on the big table. What it boils down to is if you have a result set containing all of the JobNames that you need then you can use index seeks to get the rest of the missing columns. However, you can't do that with a result set of JobNames that you DON'T need.