Best way to use PostgreSQL full text search ranking
If I use the version with repeated to_tsvector(...) will it call that twice, or is it smart enough to cache the results somehow?
The best way to notice these things is to do a simple explain, although those can be hard to read.
Long story short, yes, PostgreSQL is smart enough to reuse computed results.
Is there a way to do it without repeating the to_ts... function calls?
What I usually do is add a tsv
column which is the text search vector. If you make this auto update using triggers it immediately gives you the vector easily accessible but it also allows you to selectively update the search index by making the trigger selective.
Is there a way to use score in the WHERE clause at all?
Yes, but not with that name. Alternatively you could create a sub-query, but I would personally just repeat it.
If there is, would it be better to filter by score > 0 or use the @@ thing?
The simplest version I can think of is this:
SELECT *
FROM pictures
WHERE 'small dog' @@ text_search_vector
The text_search_vector
could obviously be replaced with something like to_tsvector('english', pictures.title)
The use of the @@
operator will utilize the full text GIN index, while the test for score > 0
would not.
I created a table as in the Question, but added a column named title_tsv
:
CREATE TABLE test_pictures (
id BIGSERIAL,
title text,
title_tsv tsvector
);
CREATE INDEX ix_pictures_title_tsv ON test_pictures
USING gin(title_tsv);
I populated the table with some test data:
INSERT INTO test_pictures(title, title_tsv)
SELECT T.data, to_tsvector(T.data)
FROM some_table T;
Then I ran the previously accepted answer with explain analyze
:
EXPLAIN ANALYZE
SELECT score, id, title
FROM (
SELECT ts_rank_cd(P.title_tsv, to_tsquery('address & shipping')) AS score
,P.id
,P.title
FROM test_pictures as P
) S
WHERE score > 0
ORDER BY score DESC;
And got the following. Note the execution time of 5,015 ms
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------------|
Gather Merge (cost=274895.48..323298.03 rows=414850 width=60) (actual time=5010.844..5011.330 rows=1477 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Sort (cost=273895.46..274414.02 rows=207425 width=60) (actual time=4994.539..4994.555 rows=492 loops=3) |
Sort Key: (ts_rank_cd(p.title_tsv, to_tsquery('address & shipping'::text))) DESC |
Sort Method: quicksort Memory: 131kB |
-> Parallel Seq Scan on test_pictures p (cost=0.00..247776.02 rows=207425 width=60) (actual time=17.672..4993.997 rows=492 loops=3) |
Filter: (ts_rank_cd(title_tsv, to_tsquery('address & shipping'::text)) > '0'::double precision) |
Rows Removed by Filter: 497296 |
Planning time: 0.159 ms |
Execution time: 5015.664 ms |
Now compare that with the @@
operator:
EXPLAIN ANALYZE
SELECT ts_rank_cd(to_tsvector(P.title), to_tsquery('address & shipping')) AS score
,P.id
,P.title
FROM test_pictures as P
WHERE P.title_tsv @@ to_tsquery('address & shipping')
ORDER BY score DESC;
And the results coming in with an execution time of about 29 ms:
QUERY PLAN |
-------------------------------------------------------------------------------------------------------------------------------------------------|
Gather Merge (cost=13884.42..14288.35 rows=3462 width=60) (actual time=26.472..26.942 rows=1477 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Sort (cost=12884.40..12888.73 rows=1731 width=60) (actual time=17.507..17.524 rows=492 loops=3) |
Sort Key: (ts_rank_cd(to_tsvector(title), to_tsquery('address & shipping'::text))) DESC |
Sort Method: quicksort Memory: 171kB |
-> Parallel Bitmap Heap Scan on test_pictures p (cost=72.45..12791.29 rows=1731 width=60) (actual time=1.781..17.268 rows=492 loops=3) |
Recheck Cond: (title_tsv @@ to_tsquery('address & shipping'::text)) |
Heap Blocks: exact=625 |
-> Bitmap Index Scan on ix_pictures_title_tsv (cost=0.00..71.41 rows=4155 width=0) (actual time=3.765..3.765 rows=1477 loops=1) |
Index Cond: (title_tsv @@ to_tsquery('address & shipping'::text)) |
Planning time: 0.214 ms |
Execution time: 28.995 ms |
As you can see in the execution plan, the index ix_pictures_title_tsv
was used in the second query, but not in the first one, making the query with the @@
operator a whopping 172 times faster!
select *
from (
SELECT
pictures.id,
ts_rank_cd(to_tsvector('english', pictures.title),
to_tsquery('small dog')) AS score
FROM pictures
) s
WHERE score > 0
ORDER BY score DESC