How can I generate all trailing substrings following a delimeter?

I think this is my favorite.

create table t (id int,str varchar(100));
insert into t (id,str) values (1,'a.b.c.d.e'),(2,'xxx.yyy.zzz');

ROWS

Click to copy

select      id
           ,array_to_string((string_to_array(str,'.'))[i:],'.')

from        t,unnest(string_to_array(str,'.')) with ordinality u(token,i)
;

Click to copy

+----+-----------------+
| id | array_to_string |
+----+-----------------+
|  1 | a.b.c.d.e       |
|  1 | b.c.d.e         |
|  1 | c.d.e           |
|  1 | d.e             |
|  1 | e               |
|  2 | xxx.yyy.zzz     |
|  2 | yyy.zzz         |
|  2 | zzz             |
+----+-----------------+

ARRAYS

Click to copy

select      id
           ,array_agg(array_to_string((string_to_array(str,'.'))[i:],'.'))

from        t,unnest(string_to_array(str,'.')) with ordinality u(token,i)

group by    id
;

Click to copy

+----+-------------------------------------------+
| id |                 array_agg                 |
+----+-------------------------------------------+
|  1 | {"a.b.c.d.e","b.c.d.e","c.d.e","d.e","e"} |
|  2 | {"xxx.yyy.zzz","yyy.zzz","zzz"}           |
+----+-------------------------------------------+

Click to copy

create table t (id int,str varchar(100));
insert into t (id,str) values (1,'a.b.c.d.e'),(2,'xxx.yyy.zzz');

ROWS

Click to copy

select  id
       ,regexp_replace(str,'^([^\.]+\.?){' || gs.i || '}','') as suffix

from    t,generate_series(0,cardinality(string_to_array(str,'.'))-1) gs(i)
;

Click to copy

select  id
       ,substring(str from '(([^.]*?\.?){' || gs.i+1 || '})$') as suffix

from    t,generate_series(0,cardinality(string_to_array(str,'.'))-1) gs(i)
;

Click to copy

+----+-------------+
| id | suffix      |
+----+-------------+
| 1  | a.b.c.d.e   |
+----+-------------+
| 1  | b.c.d.e     |
+----+-------------+
| 1  | c.d.e       |
+----+-------------+
| 1  | d.e         |
+----+-------------+
| 1  | e           |
+----+-------------+
| 2  | xxx.yyy.zzz |
+----+-------------+
| 2  | yyy.zzz     |
+----+-------------+
| 2  | zzz         |
+----+-------------+

ARRAYS

Click to copy

select      id
           ,array_agg(regexp_replace(str,'^([^\.]+\.?){' || gs.i || '}','')) as suffixes

from        t,generate_series(0,cardinality(string_to_array(str,'.'))-1) gs(i)

group by    id
;

Click to copy

select      id
           ,array_agg(substring(str from '(([^.]*?\.?){' || gs.i+1 || '})$')) as suffixes

from        t,generate_series(0,cardinality(string_to_array(str,'.'))-1) gs(i)

group by    id
;

Click to copy

+----+-------------------------------------------+
| id |                 suffixes                  |
+----+-------------------------------------------+
|  1 | {"a.b.c.d.e","b.c.d.e","c.d.e","d.e","e"} |
|  2 | {"xxx.yyy.zzz","yyy.zzz","zzz"}           |
+----+-------------------------------------------+

I don't think you need a separate column here; this is an XY-problem. You're just trying to do a suffix search. There are two main ways to optimize that.

Turn the suffix query into a prefix query

You basically do this by reversing everything.

First create an index on the reverse of your column:

Click to copy

CREATE INDEX ON yourtable (reverse(yourcolumn) text_pattern_ops);

Then query using the same:

Click to copy

SELECT * FROM yourtable WHERE reverse(yourcolumn) LIKE reverse('%t.com');

You can throw in an UPPER call if you want to make it case insensitive:

Click to copy

CREATE INDEX ON yourtable (reverse(UPPER(yourcolumn)) text_pattern_ops);
SELECT * FROM yourtable WHERE reverse(UPPER(yourcolumn)) LIKE reverse(UPPER('%t.com'));

Trigram Indexes

The other option is trigram indexes. You should definitely use this if you need infix queries (LIKE 'something%something' or LIKE '%something%' type queries).

First enable the trigram index extension:

Click to copy

CREATE EXTENSION pg_trgm;

(This should come with PostgreSQL out of the box without any extra installation.)

Then create a trigram index on your column:

Click to copy

CREATE INDEX ON yourtable USING GIST(yourcolumn gist_trgm_ops);

Then just select:

Click to copy

SELECT * FROM yourtable WHERE yourcolumn LIKE '%t.com';

Again, you can throw in an UPPER to make it case insensitive if you like:

Click to copy

CREATE INDEX ON yourtable USING GIST(UPPER(yourcolumn) gist_trgm_ops);
SELECT * FROM yourtable WHERE UPPER(yourcolumn) LIKE UPPER('%t.com');

Your question as written

Trigram indexes actually work using a somewhat more general form of what you're asking for under the hood. It breaks the string down into pieces (trigrams) and builds an index based on those. The index can then be used to search for matches much more quickly than a sequential scan, but for infix as well as suffix and prefix queries. Always try to avoid reinventing what someone else has developed when you can.

Credits

The two solutions are pretty much verbatim from Choosing a PostgreSQL text search method. I highly recommend giving it a read for a detailed analysis of the available text search options in PotsgreSQL.

How can I generate all trailing substrings following a delimeter?

ROWS

ARRAYS

ROWS

ARRAYS

Turn the suffix query into a prefix query

Trigram Indexes

Your question as written

Credits

Tags:

Postgresql

Pattern Matching

Array

Related

Recent Posts