Constant Scan spooling
Why optimizer introduces spool here, what is the reason to do it? There is nothing complex beyond the spool.
The thing beyond the spool is not a simple table reference, which could simply be duplicated when the left join / anti semi join alternative is generated.
It may look a little like a table (Constant Scan) but to the optimizer* it is a UNION ALL
of the separate rows in the VALUES
clause.
The additional complexity is enough for the optimizer to choose to spool and replay the source rows, and not replace the spool with a simple "table get" later on. For example, the initial transformation from full join looks like this:
Notice the extra spools introduced by the general transform. Spools above a simple table get are cleaned up later by the rule SpoolGetToGet
.
If the optimizer had a corresponding SpoolConstGetToConstGet
rule, it could work as you wish, in principle.
How to get rid of it in this case, what are the possible ways?
Use a real table (temporary or variable), or write the transformation from full join manually, for example:
WITH
p([Id], [Code]) AS
(
SELECT @id1, 'A'
UNION ALL
SELECT @id2, 'B'
),
FullJoin AS
(
SELECT
p.Code,
d.[Status]
FROM p
LEFT JOIN #data d
ON d.[Id] = p.[Id]
UNION ALL
SELECT
NULL,
D.[Status]
FROM #data AS D
WHERE NOT EXISTS
(
SELECT *
FROM p
WHERE p.Id = D.Id
)
)
SELECT
COALESCE(FullJoin.Code, 'X') AS Code,
COALESCE(FullJoin.Status, 0) AS [Status]
FROM FullJoin;
Plan for manual rewrite:
This has an estimated cost of 0.0067201 units, compared with 0.0203412 units for the original.
* It can be observed as a LogOp_UnionAll
in the Converted Tree (TF 8605). In the Input Tree (TF 8606) it is a LogOp_ConstTableGet
. The Converted Tree shows the tree of optimizer expression elements after parsing, normalization, algebrization, binding, and some other preparatory work. The Input Tree shows the elements after conversion to Negation Normal Form (NNF convert), runtime constant collapsing, and a few other bits and bobs. NNF convert includes logic to collapse logical unions and common table gets, among other things.
The table spool is simply creating a table out of the two sets of tuples present in the VALUES
clause.
You can eliminate the spool by inserting those values into a temp table first, like so:
DROP TABLE IF EXISTS #data;
CREATE TABLE #data ([Id] int, [Status] int);
INSERT INTO #data
VALUES (100, 1), (101, 2), (102, 3), (103, 2);
DROP TABLE IF EXISTS #p;
CREATE TABLE #p
(
Id int NOT NULL
, Code char(1) NOT NULL
);
DECLARE @id1 int = 101, @id2 int = 105;
INSERT INTO #p (Id, Code)
VALUES
(@id1, 'A'),
(@id2, 'B');
SELECT
COALESCE(p.[Code], 'X') AS [Code],
COALESCE(d.[Status], 0) AS [Status]
FROM #p p
FULL JOIN #data d ON d.[Id] = p.[Id];
Looking at the execution plan for your query, we see the output list contains two columns that use the Union
prefix; this is a hint that the spool is creating a table from a union'd source:
The FULL OUTER JOIN
requires SQL Server to access the values in p
twice, once for each "side" of the join. Creating a spool allows the resulting inner loops join to access the spooled data.
Interestingly, if you replace the FULL OUTER JOIN
with a LEFT JOIN
and a RIGHT JOIN
, and UNION
the results together, SQL Server does not use a spool.
SELECT
COALESCE(p.[Code], 'X') AS [Code],
COALESCE(d.[Status], 0) AS [Status]
FROM (VALUES
(101, 'A'),
(105, 'B')
) p([Id], [Code])
LEFT JOIN #data d ON d.[Id] = p.[Id]
UNION
SELECT
COALESCE(p.[Code], 'X') AS [Code],
COALESCE(d.[Status], 0) AS [Status]
FROM (VALUES
(101, 'A'),
(105, 'B')
) p([Id], [Code])
RIGHT JOIN #data d ON d.[Id] = p.[Id];
Note, I'm not suggesting using the UNION
query above; for larger sets of input, it may not be more efficient than the simple FULL OUTER JOIN
you already have.