How to most efficiently find out if a record has child records?
Don't forget that Postgres has a boolean datatype. The following is the most succinct way to express the query:
select
parent_id,
name,
exists (select from child where parent_id = p.parent_id) as has_children
from parent p;
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=86748ba18ba8c0f31f1b77a74230f67b
Methods
Aggregate Method
The popular way we'll call it the aggregate method. Note bool_or(child_id IS NOT NULL)
also works but was not any faster.
SELECT parent_id, count(*)>1 AS has_children
FROM parent
LEFT OUTER JOIN children
USING (parent_id)
GROUP BY parent_id;
LEFT JOIN LATERAL
with limit
But you may also try this, with LEFT JOIN LATERAL()
like this..
SELECT parent_id, has_children
FROM parent AS p
LEFT JOIN LATERAL (
SELECT true
FROM children AS c
WHERE c.parent_id = p.parent_id
FETCH FIRST ROW ONLY
) AS t(has_children)
ON (true);
EXISTS
Just FYI, you can use CROSS JOIN LATERAL
with EXISTS
too (which is I believe how it's planned). We'll call it the EXISTS method.
SELECT parent_id, has_children
FROM parent AS p
CROSS JOIN LATERAL (
SELECT EXISTS(
SELECT
FROM children AS c
WHERE c.parent_id = p.parent_id
)
) AS t(has_children);
Which is the same as,
SELECT parent_id, EXISTS(
SELECT
FROM children AS c
WHERE c.parent_id = p.parent_id
) AS has_children
FROM parent AS p;
Benchmarks
Sample dataset
1000000 children, 2500 parents. Our sims get it done.
CREATE TABLE parent (
parent_id int PRIMARY KEY
);
INSERT INTO parent
SELECT x
FROM generate_series(1,1e4,4) AS gs(x);
CREATE TABLE children (
child_id int PRIMARY KEY,
parent_id int REFERENCES parent
);
INSERT INTO children
SELECT x, 1 + (x::int%1e4)::int/4*4
FROM generate_series(1,1e6) AS gs(x);
VACUUM FULL ANALYZE children;
VACUUM FULL ANALYZE parent;
Results (pt1)
- Aggregate method: 450ms,
LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY )
: 850ms- EXISTS method: 850ms
Results (adding an index and running again)
Now let's add an index
CREATE INDEX ON children (parent_id);
ANALYZE children;
Now the timing profile is totally different,
- Aggregate method: 450ms,
LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY )
: 30ms- EXISTS method: 30ms
This is how I would do in SQL server (I don't have postgresql- I'm guessing it would be similiar)
SELECT p.parent_id,
CASE WHEN EXISTS (SELECT 1 FROM Child c WHERE c.ParentId=p.ParentId)
THEN 'Yes'
ELSE 'No'
END as has_child,
FROM Parent p
--WHERE EXISTS (SELECT 1 FROM Child c WHERE c.ParentId=p.ParentId)