PostgreSQL full text search on many columns

I suggest this expression for query and index:

SELECT * FROM tbl
WHERE  to_tsvector('simple', f_concat_ws(' ', country, city, street, house_nr, postcode))
    @@ plainto_tsquery('simple', '22 Kärntner Wien');

Note the custom function f_concat_ws() above. That's because concat_ws() is only STABLE not IMMUTABLE. You need to create it first:

CREATE OR REPLACE FUNCTION f_concat_ws(text, VARIADIC text[])
  RETURNS text LANGUAGE sql IMMUTABLE AS 'SELECT array_to_string($2, $1)';

It can be used as drop-in replacement for concat_ws(), except that it only accepts actual text data as input (which allows us to make it IMMUTABLE without cheating, effectively). Detailed explanation (read it!):

Combine two columns and add into one new column

About VARIADIC:

Passing multiple values in single parameter

For many columns, this is shorter and faster. You could do without it but then the syntax gets rather verbose (see joanolo's answer).

The matching index to go with this:

CREATE INDEX tbl_adr_fts_idx ON tbl USING GIN (
       to_tsvector('simple', f_concat_ws(' ', country, city, street, house_nr, postcode)));

You are dealing with international address data, so do not use the english text search configuration. Stemming makes little sense for names and most of your example data is not even English to begin with. Use the simple configuration instead. You need the form with two parameters - see below.

Concatenate the strings and call the more expensive function to_tsvector() once. Use concat_ws() to deal with possible NULL values elegantly. Cheaper overall, and also shorter.

String replace using concatenated strings from various columns

Like I commented, Full Text Search has limited support for fuzzy matching, but there is the often overlooked feature of prefix matching:

Get partial match from GIN indexed TSVECTOR column

So, if you are not sure whether it's 'Kärntner' or 'Kärnten', and whether it's 'Straße', 'strasse' or 'Strabe' (like in your buggy example data) but you know that the second word follows the first, you could:

... @@ to_tsquery('simple', '22 & Kärnt:* <-> Stra:* & Wien')

<-> is the phrase search operator and requires Postgres 9.6.

How to store short stories for access to individual sentences?

And if you want to ignore diacritical signs as well ('ä' <> 'a'), add unaccent() to the mix. You can use it as separate function or you can add it as dictionary to your text search configuration. You need to install the extension first ...

Does PostgreSQL support “accent insensitive” collations?

Overview over pattern matching option in typical Postgres installations:

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Joanolo already provided some basic information about FTS and the link to the manual for more.

Adressing your comment

I am trying add this index it but gives me an error:
ERROR: functions in index expression must be marked IMMUTABLE

There are two variants of the function to_tsvector() - see "function overloading". The 1st takes only text, the 2nd takes regconfig and text. See for yourself:

SELECT proname, provolatile, proargtypes[0]::regtype, proargtypes[1]::regtype
FROM   pg_proc
WHERE  proname = 'to_tsvector';

Only the second is IMMUTABLE and can be used in an index expression directly. 'simple' in the above example is a text search configuration (regconfig).

More importantly, my oversight: concat_ws() (which I had in my first version) is only STABLE, not IMMUTABLE. I added necessary steps above.

How do IMMUTABLE, STABLE and VOLATILE keywords effect behaviour of function?
Combine two columns and add into one new column

Let's imagine this is your table and some data:

CREATE TABLE t
(
    country text,
    city text,
    street text,
    house_number text,
    post_code text
) ;

INSERT INTO t
VALUES
   ('Österreich', 'Vienna', 'HauptStrasse', '123', '12345'),   
   ('France', 'Paris', 'Rue du Midi', '12A', '01234'),   
   ('España', 'Barcelona', 'Passeig de Gràcia', '32', '08001'),   
   ('United Kingdom', 'London', 'Oxford Street', '20', 'W1D 1AS'),
   ('Nederland', 'Amsterdam', 'Leidsekruisstraat', '6-8', '1017 RH') ;

[NOTE: check it at http://rextester.com/DOJN8533]

The way to perform a full text search to several columns using PostgreSQL (assuming 'english' is the name of your FTS configuration), is by using a query like:

SELECT
    *
FROM
    t
WHERE
    (
        to_tsvector('english', coalesce(country, ''))      || 
        to_tsvector('english', coalesce(city, ''))         || 
        to_tsvector('english', coalesce(street, ''))       || 
        to_tsvector('english', coalesce(house_number, '')) ||
        to_tsvector('english', coalesce(post_code, '')) 
    ) @@ plainto_tsquery('english', 'Amsterdam') ;

The where clasuse means:

 (this tsvector = document) @@ /* matches */  (this tsquery = query)

A tsvector is a special data type used by PostgreSQL to store transformed data (for instance, all lowercased; with commas taken out, with words identified and listed, etc.) about a text. A tsquery is a way to ask for characteristics of a document (for instance containing this _and_ that).

The || operator combines tsvectors (let's say it "adds them together").

If you want to speed up things, you should have one functional index, defined like:

CREATE INDEX ts_idx 
    ON t USING gist ( 
    (
        to_tsvector('english', coalesce(country, '')) || 
        to_tsvector('english', coalesce(city, '')) || 
        to_tsvector('english', coalesce(street, '')) || 
        to_tsvector('english', coalesce(house_number, '')) ||
        to_tsvector('english', coalesce(post_code, ''))
    ) 
) ;

You need to carefully check the documentation about Full Text Search. It is a bit intimidating, because there are lots of possibilities, but it's worth spending the time.

To sort out results when there are many, you should use he ts_rank function to ORDER BY and then limit.

PostgreSQL full text search on many columns

Adressing your comment

Tags:

Postgresql

Full Text Search

Pattern Matching

Related

Recent Posts