Difference between LIKE and ~ in Postgres
There is nothing wrong with LIKE
and, IMO, no reason to favour ~
over it. Rather the opposite. LIKE
is SQL-standard. So is SIMILAR TO
, but it isn't widely supported. PostgreSQL's ~ operator
(or posix regular expression matching operator) isn't SQL standard.
For that reason, I prefer to use LIKE
where it's expressive enough and I only use ~
when I need the power of full regular expressions. If I ever need to port databases it's one less thing that'll hurt. I've tended to use SIMILAR TO
when LIKE
isn't powerful enough, but after Erwin's comments I think I'll stop doing that and use ~
when LIKE
doesn't do the job.
Also, PostgreSQL can use a b-tree index for prefix searches (eg LIKE 'TEST%'
) with LIKE
or SIMILAR TO
if the database is in the C
locale or the index has text_pattern_ops
. Contrary to what I wrote earlier, Pg can also use such an index for a left-anchored posix regex, it just needs an explicit '^TEST.*' so the regex can only match from the beginning. My post earlier incorrectly stated that ~
couldn't use an index for a prefix search. With that difference eliminated it's really down to whether you want to stick to standard compliant features where possible or not.
See this demo SQLFiddle; note the different execution plans. Note the difference between ~ '1234.*'
and ~ '^1234.*'
.
Given sample data:
create table test (
blah text
);
insert into test (blah) select x::text from generate_series(1,10000) x;
create index test_blah_txtpat_idx ON test(blah text_pattern_ops);
note that ~
uses a seqscan even when it's substantially more expensive (artifically so due to enable_seqscan
) because it has no alternative, while LIKE
uses the index. However, a corrected ~
with a left anchor uses the index too:
regress=# SET enable_seqscan = 'f';
SET
regress=# explain select 1 from test where blah ~ '12.*';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on test (cost=10000000000.00..10000000118.69 rows=2122 width=0)
Filter: (blah ~ '12.*'::text)
(2 rows)
regress=# explain select 1 from test where blah like '12%';
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=4.55..46.76 rows=29 width=0)
Filter: (blah ~~ '12%'::text)
-> Bitmap Index Scan on test_blah_txtpat_idx (cost=0.00..4.54 rows=29 width=0)
Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)
regress=# explain select 1 from test where blah ~ '^12.*';
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=5.28..51.53 rows=101 width=0)
Filter: (blah ~ '^12.*'::text)
-> Bitmap Index Scan on test_blah_txtpat_idx (cost=0.00..5.25 rows=100 width=0)
Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)
Overview
LIKE
, SIMILAR TO
and ~
are the basic pattern matching operators in PostgreSQL.
If you can, use LIKE
(~~
), it's fastest and simplest.
If you can't, use a regular expression (~
), it's more powerful.
Never user . It's pointless. See below.SIMILAR TO
Installing the additional module pg_trgm adds advanced index options and the similarity operator %
.
And there is also text search with its own infrastructure and the @@
operator (among others).
Index support is available for each of these operators - to a varying degree. It regularly trumps the performance of other options. But there is a lot of leeway in the details, even with indexes.
Index support
Without pg_trgm, there is only index support for left anchored search patterns. If your database cluster runs with a non-C locale (typical case), you need an index with a special operator class for that, like text_pattern_ops
or varchar_pattern_ops
. Basic left-anchored regular expressions are supported by this, too. Example:
CREATE TABLE tbl(string text);
INSERT INTO tbl(string)
SELECT x::text FROM generate_series(1, 10000) x;
CREATE INDEX tbl_string_text_pattern_idx ON tbl(string text_pattern_ops);
SELECT * FROM tbl WHERE string ~ '^1234'; -- left anchored pattern
db<>fiddle here
With pg_trgm installed, GIN or GiST indexes are possible with the operator classes gist_trgm_ops
or gin_trgm_ops
. These indexes support any LIKE
expression, not just left anchored. And, quoting the manual:
Beginning in PostgreSQL 9.3, these index types also support index searches for regular-expression matches.
Details:
- PostgreSQL LIKE query performance variations
SIMILAR TO
is a very odd construct. PostgreSQL only implements it because it was defined in early versions of the SQL standard. Internally, every SIMILAR TO
expression is rewritten with a regular expression. Therefore, for any given SIMILAR TO
expression, there is at least one regular expression doing the same job faster. I never use SIMILAR TO
.
Further reading:
- Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
~
is the regular expression operator, and has the capabilities implied by that. You can specify a full range of regular expression wildcards and quantifiers; see the documentation for details. It is certainly more powerful than LIKE
, and should be used when that power is needed, but they serve different purposes.