PostgreSQL query very slow when subquery added
The core of the problem becomes obvious here:
Seq Scan on publication (cost=0.01..349652.84 rows=744661 width=8) (actual time=2735.888..2841.393 rows=1 loops=1)
Postgres estimates to return 744661 rows while, in fact, it turns out to be a single row. If Postgres does not know better what to expect from the query it cannot plan better. We would need to see the actual query hidden behind (SELECT 9762715)
- and probably also know table definition, constraints, cardinalities and data distribution. Obviously, Postgres is not able to predict how few rows will be returned by it. There may be ways to rewrite the query, depending on what it is.
If you know that the subquery can never return more than n
rows, you can just tell Postgres by using:
SELECT mtid
FROM publication
WHERE mtid IN (SELECT ... LIMIT n) -- OR last_modifier=21321
LIMIT 5000;
If n is small enough, Postgres will switch to (bitmap) index scans. However, that only works for the simple case. Stops working when adding an OR
condition: the query planner can't currently cope with that.
I rarely use IN (SELECT ...)
to begin with. Typically there is a better way to implement the same, often with an EXISTS
semi-join. Sometimes with a (LEFT
) JOIN
(LATERAL
) ...
The obvious workaround would be to use UNION
, but you ruled that out. I can't say more without knowing the actual subquery and other relevant details.
My colleague has found a way to change the query so that it needs a simple rewrite and does what it needs to do, i.e. doing the subselect in one step, and then doing the further operations on the result:
SELECT mtid FROM publication
WHERE
mtid = ANY( (SELECT ARRAY(SELECT 9762715))::bigint[] )
OR last_modifier=21321
LIMIT 5000;
The explain analyze now is:
Limit (cost=92.58..9442.38 rows=2478 width=8) (actual time=0.071..0.074 rows=1 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.01..0.02 rows=1 width=0) (actual time=0.010..0.011 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
-> Bitmap Heap Scan on publication (cost=92.56..9442.36 rows=2478 width=8) (actual time=0.069..0.070 rows=1 loops=1)
Recheck Cond: ((mtid = ANY (($1)::bigint[])) OR (last_modifier = 21321))
Heap Blocks: exact=1
-> BitmapOr (cost=92.56..92.56 rows=2478 width=0) (actual time=0.060..0.060 rows=0 loops=1)
-> Bitmap Index Scan on publication_pkey (cost=0.00..44.38 rows=10 width=0) (actual time=0.046..0.046 rows=1 loops=1)
Index Cond: (mtid = ANY (($1)::bigint[]))
-> Bitmap Index Scan on publication_last_modifier_btree (cost=0.00..46.94 rows=2468 width=0) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: (last_modifier = 21321)
Planning time: 0.704 ms
Execution time: 0.153 ms
It seems we can create a simple parser that finds and rewrites all subselects this way, and add it to a hibernate hook to manipulate the native query.