To find infinite recursive loop in CTE
With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.
Setup:
create table hierarchy (id integer, parent_id integer);
insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3),
(5, 4),
(3, 5); -- endless loop
Recursive query:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;
To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents,
id as root_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id,
p.root_id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
and c.root_id = p.root_id
)
select *
from tree;
Update for Postgres 14
Postgres 14 introduced the (standard compliant) CYCLE
option to detect cycles:
with recursive tree as (
select id,
parent_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id
from hierarchy c
join tree p
on c.parent_id = p.id
)
cycle id -- track cycles for this column
set is_cycle -- adds a boolean column is_cycle
using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
You haven't specified the dialect or your column names, so it is difficult to make the perfect example...
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 2, 'SubChild')
-- End random data
;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
UNION ALL
SELECT R.StartingID, R.Level + 1,
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.*
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description
FROM RecursiveCTE
ORDER BY StartingID, Level
Something like this will show if/where there are loops in the recursive cte. Look at the column Loop
. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.
In the end the recursive cte creates a VARCHAR(MAX)
of ids in the form |id1|id2|id3|
(called Parents
) and then checks if the current ID
is already in that "list". If yes, it sets the Loop
column to 1. This column is checked in the recursive join (the ABD R.Loop = 0
).
The ending query uses a MAX() OVER (PARTITION BY ...)
to set to 1 the Loop
column for a whole "block" of chains.
A little more complex, that generates a "better" report:
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 3, 'SubChild')
-- End random data
-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
SELECT MT1.* FROM #MyTable MT1
WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)
, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
SELECT ID, -- StartingID
1, -- Level
'|' + CAST(ID AS VARCHAR(MAX)) + '|',
'|' + CAST(Description AS VARCHAR(MAX)) + '|',
0, -- Loop
ParentID
FROM WithoutChildren
UNION ALL
SELECT R.StartingID, -- StartingID
R.Level + 1, -- Level
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.ParentID
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT * FROM RecursiveCTE
WHERE ParentID IS NULL OR Loop = 1
This query should return all the "last child" rows, with the full parent chain. The column Loop
is 0
if there is no loop, 1
if there is a loop.