Using EXCEPT in a recursive common table expression

The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:

  1. Split the CTE expression into anchor and recursive members.
  2. Run the anchor member(s) creating the first invocation or base result set (T0).
  3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
  4. Repeat step 3 until an empty set is returned.
  5. Return the result set. This is a UNION ALL of T0 to Tn.

Note the above is a logical description. The physical order of operations can be somewhat different as illustrated here

Applying this to your CTE I would expect an infinite loop with the following pattern

+-----------+---------+---+---+---+
| Invocation| Results             |
+-----------+---------+---+---+---+
|         1 |       1 | 2 | 3 |   |
|         2 |       4 | 5 |   |   |
|         3 |       1 | 2 | 3 |   |
|         4 |       4 | 5 |   |   |
|         5 |       1 | 2 | 3 |   |
+-----------+---------+---+---+---+ 

Because

select a
from cte
where a in (1,2,3)

is the Anchor expression. This clearly returns 1,2,3 as T0

Thereafter the recursive expression runs

select a
from cte
except
select a
from r

With 1,2,3 as input that will yield an output of 4,5 as T1 then plugging that back in for the next round of recursion will return 1,2,3 and so on indefinitely.

This is not what actually happens however. These are the results of the first 5 invocations

+-----------+---------+---+---+---+
| Invocation| Results             |
+-----------+---------+---+---+---+
|         1 |       1 | 2 | 3 |   |
|         2 |       1 | 2 | 4 | 5 |
|         3 |       1 | 2 | 3 | 4 |
|         4 |       1 | 2 | 3 | 5 |
|         5 |       1 | 2 | 3 | 4 |
+-----------+---------+---+---+---+

From using OPTION (MAXRECURSION 1) and adjusting upwards in increments of 1 it can be seen that it enters a cycle where each successive level will continually toggle between outputting 1,2,3,4 and 1,2,3,5.

As discussed by @Quassnoi in this blog post. The pattern of observed results is as if each invocation is doing (1),(2),(3),(4),(5) EXCEPT (X) where X is the last row from the previous invocation.

Edit: After reading SQL Kiwi's excellent answer it is clear both why this occurs and that this is not the whole story in that there is still loads of stuff left on the stack that never can get processed.

Anchor Emits 1,2,3 to client Stack Contents 3,2,1

3 popped off stack, Stack Contents 2,1

The LASJ returns 1,2,4,5, Stack Contents 5,4,2,1,2,1

5 popped off stack, Stack Contents 4,2,1,2,1

The LASJ returns 1,2,3,4 Stack Contents 4,3,2,1,5,4,2,1,2,1

4 popped off stack, Stack Contents 3,2,1,5,4,2,1,2,1

The LASJ returns 1,2,3,5 Stack Contents 5,3,2,1,3,2,1,5,4,2,1,2,1

5 popped off stack, Stack Contents 3,2,1,3,2,1,5,4,2,1,2,1

The LASJ returns 1,2,3,4 Stack Contents 4,3,2,1,3,2,1,3,2,1,5,4,2,1,2,1

If you try and replace the recursive member with the logically equivalent (in the absence of duplicates / NULLs) expression

select a
from (
    select a
    from cte
    where a not in 
    (select a
    from r)
) x

This is not allowed and raises the error "Recursive references are not allowed in subqueries." so perhaps it is an oversight that EXCEPT is even allowed in this case.

Addition: Microsoft have now responded to my Connect Feedback as below

Jack's guess is correct: this should have been a syntax error; recursive references should indeed not be allowed in EXCEPT clauses. We plan to address this bug in an upcoming service release. In the meantime, I would suggest to avoid recursive references in EXCEPT clauses.

In restricting recursion over EXCEPT we follow the ANSI SQL standard, which has included this restriction ever since recursion was introduced (in 1999 I believe). There is no widespread agreement over what the semantics should be for recursion over EXCEPT (also called "unstratified negation") in declarative languages such as SQL. In addition, it is notoriously hard (if not impossible) to implement such semantics efficiently (for reasonably sized databases) in an RDBMS system.

And looks like the eventual implementation was made in 2014 for databases with compatibility level of 120 or higher.

Recursive references in an EXCEPT clause generates an error in compliance with the ANSI SQL standard.


See Martin Smith's answer for information about the current status of EXCEPT in a recursive CTE.

To explain what you were seeing, and why:

I'm using a table variable here, to make the distinction between the anchor values and recursive item clearer (it does not change the semantic).

DECLARE @V TABLE (a INTEGER NOT NULL)
INSERT  @V (a) VALUES (1),(2)
;
WITH rCTE AS 
(
    -- Anchor
    SELECT
        v.a
    FROM @V AS v

    UNION ALL

    -- Recursive
    SELECT
        x.a
    FROM
    (
        SELECT
            v2.a
        FROM @V AS v2

        EXCEPT

        SELECT
            r.a
        FROM rCTE AS r
    ) AS x
)
SELECT
    r2.a
FROM rCTE AS r2
OPTION (MAXRECURSION 0)

The query plan is:

Recursive CTE Plan

Execution starts at the root of the plan (the SELECT) and control passes down the tree to the Index Spool, Concatenation, and then to the top-level Table Scan.

The first row from the scan passes up the tree and is (a) stored in the Stack Spool, and (b) returned to the client. Which row is first is not defined, but let us assume it is the row with the value {1}, for the sake of argument. The first row to appear is therefore {1}.

Control again passes down to the Table Scan (the Concatenation operator consumes all rows from its outermost input before opening the next one). The scan emits the second row (value {2}), and this again passes up the tree to be stored on the stack and output to the client. The client has now received the sequence {1}, {2}.

Adopting a convention where the top of the LIFO stack is on the left, the stack now contains {2, 1}. As control again passes to the Table Scan, it reports no more rows, and control passes back to the Concatenation operator, which opens it's second input (it needs a row to pass up to the stack spool), and control passes to the Inner Join for the first time.

The Inner join calls the Table Spool on its outer input, which reads the top row from the stack {2} and deletes it from the worktable. The stack now contains {1}.

Having received a row on its outer input, the Inner Join passes control down its inner input to the Left Anti-Semi Join (LASJ). This requests a row from its outer input, passing control to the Sort. Sort is a blocking iterator, so it reads all rows from the table variable and sorts them ascending (as it happens).

The first row emitted by the Sort is therefore the value {1}. The inner side of the LASJ returns the current value of the recursive member (the value just popped off the stack), which is {2}. The values at the LASJ are {1} and {2} so {1} is emitted, since the values do not match.

This row {1} flows up the query plan tree to the Index (Stack) Spool where it is added to the stack, which now contains {1, 1}, and emitted to the client. The client has now received the sequence {1}, {2}, {1}.

Control now passes back to the Concatenation, back down the inner side (it returned a row last time, might do again), down through the Inner Join, to the LASJ. It reads its inner input again, obtaining the value {2} from the Sort.

The recursive member is still {2}, so this time the LASJ finds {2} and {2}, resulting in no row being emitted. Finding no more rows on its inner input (the Sort is now out of rows), control passes back up to the Inner Join.

The Inner Join reads its outer input, which results in the value {1} being popped off the stack {1, 1}, leaving the stack with just {1}. The process now repeats, with the value {2} from a new invocation of the Table Scan and Sort passing the LASJ test and being added to the stack, and passes to the client, which has now received {1}, {2}, {1}, {2}...and round we go.

My favourite explanation of the Stack spool used in recursive CTE plans is Craig Freedman's.