Difference between LIKE and ~ in Postgres

There is nothing wrong with LIKE and, IMO, no reason to favour ~ over it. Rather the opposite. LIKE is SQL-standard. So is SIMILAR TO, but it isn't widely supported. PostgreSQL's ~ operator (or posix regular expression matching operator) isn't SQL standard.

For that reason, I prefer to use LIKE where it's expressive enough and I only use ~ when I need the power of full regular expressions. If I ever need to port databases it's one less thing that'll hurt. I've tended to use SIMILAR TO when LIKE isn't powerful enough, but after Erwin's comments I think I'll stop doing that and use ~ when LIKE doesn't do the job.

Also, PostgreSQL can use a b-tree index for prefix searches (eg LIKE 'TEST%') with LIKE or SIMILAR TO if the database is in the C locale or the index has text_pattern_ops. Contrary to what I wrote earlier, Pg can also use such an index for a left-anchored posix regex, it just needs an explicit '^TEST.*' so the regex can only match from the beginning. My post earlier incorrectly stated that ~ couldn't use an index for a prefix search. With that difference eliminated it's really down to whether you want to stick to standard compliant features where possible or not.

See this demo SQLFiddle; note the different execution plans. Note the difference between ~ '1234.*' and ~ '^1234.*'.

Given sample data:

create table test (
   blah text
);
insert into test (blah)  select x::text from generate_series(1,10000) x;
create index test_blah_txtpat_idx ON test(blah text_pattern_ops);

note that ~ uses a seqscan even when it's substantially more expensive (artifically so due to enable_seqscan) because it has no alternative, while LIKE uses the index. However, a corrected ~ with a left anchor uses the index too:

regress=# SET enable_seqscan = 'f';
SET
regress=# explain select 1 from test where blah ~ '12.*';
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on test  (cost=10000000000.00..10000000118.69 rows=2122 width=0)
   Filter: (blah ~ '12.*'::text)
(2 rows)
regress=# explain select 1 from test where blah like '12%';
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=4.55..46.76 rows=29 width=0)
   Filter: (blah ~~ '12%'::text)
   ->  Bitmap Index Scan on test_blah_txtpat_idx  (cost=0.00..4.54 rows=29 width=0)
         Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)
regress=# explain select 1 from test where blah ~ '^12.*';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=5.28..51.53 rows=101 width=0)
   Filter: (blah ~ '^12.*'::text)
   ->  Bitmap Index Scan on test_blah_txtpat_idx  (cost=0.00..5.25 rows=100 width=0)
         Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)

Overview

LIKE, SIMILAR TO and ~ are the basic pattern matching operators in PostgreSQL.

If you can, use LIKE (~~), it's fastest and simplest.
If you can't, use a regular expression (~), it's more powerful.
Never user SIMILAR TO. It's pointless. See below.

Installing the additional module pg_trgm adds advanced index options and the similarity operator %.
And there is also text search with its own infrastructure and the @@ operator (among others).

Index support is available for each of these operators - to a varying degree. It regularly trumps the performance of other options. But there is a lot of leeway in the details, even with indexes.

Index support

Without pg_trgm, there is only index support for left anchored search patterns. If your database cluster runs with a non-C locale (typical case), you need an index with a special operator class for that, like text_pattern_ops or varchar_pattern_ops. Basic left-anchored regular expressions are supported by this, too. Example:

CREATE TABLE tbl(string text);

INSERT INTO  tbl(string)
SELECT x::text FROM generate_series(1, 10000) x;

CREATE INDEX tbl_string_text_pattern_idx ON tbl(string text_pattern_ops);

SELECT * FROM tbl WHERE string ~ '^1234';  -- left anchored pattern

db<>fiddle here

With pg_trgm installed, GIN or GiST indexes are possible with the operator classes gist_trgm_ops or gin_trgm_ops. These indexes support any LIKE expression, not just left anchored. And, quoting the manual:

Beginning in PostgreSQL 9.3, these index types also support index searches for regular-expression matches.

Details:

  • PostgreSQL LIKE query performance variations

SIMILAR TO is a very odd construct. PostgreSQL only implements it because it was defined in early versions of the SQL standard. Internally, every SIMILAR TO expression is rewritten with a regular expression. Therefore, for any given SIMILAR TO expression, there is at least one regular expression doing the same job faster. I never use SIMILAR TO.

Further reading:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

~ is the regular expression operator, and has the capabilities implied by that. You can specify a full range of regular expression wildcards and quantifiers; see the documentation for details. It is certainly more powerful than LIKE, and should be used when that power is needed, but they serve different purposes.