Detect duplicate items in recursive CTE
The word dep
in the second query (after union
) is ambiguous. In fact it is interpreted as the column of rdeps
, not as an alias of objectdependencies.
with recursive rdeps as (
select dep
from objectdependencies dep
where dep.dependson = 4 -- starting point
union all
select dep -- this means r.dep
from objectdependencies dep
join rdeps r
on (r.dep).id = dep.dependson
) select (dep).id from rdeps;
This is why the query creates an endless loop. You can correct this by changing the alias:
with recursive rdeps as (
select dep
from objectdependencies dep
where dep.dependson = 4 -- starting point
union all
select objectdep
from objectdependencies objectdep
join rdeps r
on (r.dep).id = objectdep.dependson
) select (dep).id from rdeps;
id
----
1
2
3
1
2
1
(6 rows)
Or better, just by using columns, like the good Lord intended:
with recursive rdeps as (
select id, dependson
from objectdependencies
where dependson = 4
union all
select d.id, d.dependson
from objectdependencies d
join rdeps r
on r.id = d.dependson
)
select *
from rdeps;
The first query in the question is all you can do in plain sql as there is no communication between different (paralel) branches generated by a recursive query. In a functional approach you can use a temporary table as a store common to all branches. The function may look like this:
create or replace function rec_function(int)
returns void language plpgsql as $$
declare
i int;
begin
for i in
select id
from objectdependencies
where dependson = $1
loop
if not exists(
select from temp_table
where id = i)
then
insert into temp_table values(i);
perform rec_function(i);
end if;
end loop;
end $$;
Usage:
create temp table temp_table(id int);
select rec_function(4);
select *
from temp_table;
I know is't a rather old question, but I'm a little amazed that no one suggested removing all
from the union
to eliminate the duplicates early. It is a rather simple way to prevent duplicates in the result of a recursive CTE, but it does have its caveats — such a result must include only real fields, i.e. no calculated on the go depth
, path
or whatever.
Using the sample data from the question with this query (reformatted a little from the original):
with recursive
rdeps as (
select dep.id, dep.id as dependson
from objectdependencies as dep
where dep.dependson = 4 -- starting point
union
select self.id, dep.dependson
from rdeps as self
join objectdependencies as dep on dep.dependson = self.id
)
select dependson from rdeps;
I get precisely 1
, 2
, and 3
.
Moreover, this solution prevents endless loop in case of a cycle in dependencies. However, it does not detect it, as it doesn't and can't show that there is a cycle, it just prevents an endless loop.