GIN index ignored when query has many (more than 18) OR statements
The problem is that PostgreSQL assumes that more rows will be returned when you specify more OR
conditions, so at some point it will think that an index scan won't be faster any more.
You could create a function that extracts the interesting parts of the JSON:
CREATE FUNCTION get_array(jsonb) RETURNS text[]
LANGUAGE sql IMMUTABLE AS
$$SELECT array_agg(x->>'href') FROM jsonb_array_elements($1) AS v(x)$$;
Then you can index
CREATE INDEX ON vsko_mailer_api_prod.emailjobs USING gin (get_array("externalReferences"));
and query like this:
... WHERE get_array("externalReferences")
&& ARRAY['/responsibilities/pending/06ad562d-379a-4827-a9d3-1de179e29852',
'/responsibilities/pending/0e754454-db34-4ec2-8192-898f34928e80',
...
]
The “overlaps” operator should do what you want.
As you add more OR conditions, it thinks it will return more rows. Since there is an ORDER BY and a LIMIT, at some point it thinks that walking a different index, which provides the same ordering as the ORDER BY and stopping early once it hits the LIMIT, will be faster than getting all rows which meet the WHERE clause, and then sorting them into order and applying the LIMIT.
The easiest way to fix this is to skunk the index being used to ORDER BY ("emailjobs_ordered_created_and_keys"). If "key" is numeric, then you can do:
order by "$$meta.created" asc,"key"+0 asc
Or if it textual, then
order by "$$meta.created" asc,"key"||'' asc
This will not change the actual order, but will prevent PostgreSQL from recognizing that it can use the index to supply that order.
You can also use CTE as an optimization fence which forces the full select to be run before ordering and applying the limit:
explain analyze
with t as (
select * from "emailjobs"
where "emailjobs"."$$meta.deleted" = false
and ("externalReferences" @> '[{"href":"/responsibilities/pending/06ad562d-379a-4827-a9d3-1de179e29852"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/0e754454-db34-4ec2-8192-898f34928e80"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/0f93b24c-ee8f-4872-896c-9a75fcb904e0"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/10f38c74-6b6f-4298-819f-a2a2e61f23ce"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/127468cf-0bb4-41fe-8d5d-2886a4f0d4a4"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/297aefb9-e4aa-4b9e-9dcd-8687df1b200a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2e5c4b39-4375-49be-9bee-7b8106502c59"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f08f86c-696b-4f71-bbe3-51cf2bc74387"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f443528-b173-4522-bdb2-7c942112d781"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/2f8e87f6-4573-478d-925d-fca286cc6cee"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/318417fa-9c41-4377-b815-02f40147ce65"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/3b4cd61c-6adc-4b56-8519-5bd2612fb214"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/3e5ed89e-bfa4-47f9-a056-9817f8e91a4a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/404a90d6-a6e8-4068-8d0c-904685705129"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/524a5dde-7245-4796-9cd5-1f1eee9b4f6a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/5b4b6606-e3ec-49a4-8882-0faeb7d51f60"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/62e267f7-6720-4874-8c24-270465d7f171"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/64212e18-ba9e-4dc6-b35b-9bd66ec4d49a"}]'::jsonb
OR "externalReferences" @> '[{"href":"/responsibilities/pending/64212e18-ba9e-4dc6-b35b-9bd66ec4d49a"}]'::jsonb
)
)
select * from t
order by "$$meta.created" asc,"key" asc
limit '500'
Starting in v12, this is no longer an optimization fence unless you write it as with t as materialized
Why is PostgreSQL getting it wrong in the first place? JSON objects have no statistics gathered on them, so PostgreSQL has to use generic estimates about how many rows will match the @> operators. You can see it is over-estimating that by about a factor of 40. This is a double-whammy, as it thinks it will have 40 times more rows to sort if it follow the sorting route, and thinks it will have 40 times less of the already-ordered index to scan (before the LIMIT kicks in) if it takes that route. One way to potentially fix that is by mapping the JSONB into regular PostgreSQL arrays, as those do have statistics gathered on them. You can either store them in the table that way, or you can create a functional index as outlined by Laurenz. However, in my hands this did even worse for row estimates than the JSONB did: the most_common_elems column ended up being NULL, for reasons already explored.
So I would go with the quick and dirty hack first described, at least until some future version of PostgreSQL offers better solutions.