How to search a specific value in all tables (PostgreSQL)?

How about dumping the contents of the database, then using grep?

$ pg_dump --data-only --inserts -U postgres your-db-name > a.tmp
$ grep United a.tmp
INSERT INTO countries VALUES ('US', 'United States');
INSERT INTO countries VALUES ('GB', 'United Kingdom');

The same utility, pg_dump, can include column names in the output. Just change --inserts to --column-inserts. That way you can search for specific column names, too. But if I were looking for column names, I'd probably dump the schema instead of the data.

$ pg_dump --data-only --column-inserts -U postgres your-db-name > a.tmp
$ grep country_code a.tmp
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('US', 'United  States');
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('GB', 'United Kingdom');

Here's a pl/pgsql function that locates records where any column contains a specific value. It takes as arguments the value to search in text format, an array of table names to search into (defaults to all tables) and an array of schema names (defaults all schema names).

It returns a table structure with schema, name of table, name of column and pseudo-column ctid (non-durable physical location of the row in the table, see System Columns)

CREATE OR REPLACE FUNCTION search_columns(
    needle text,
    haystack_tables name[] default '{}',
    haystack_schema name[] default '{}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
  FOR schemaname,tablename,columnname IN
      SELECT c.table_schema,c.table_name,c.column_name
      FROM information_schema.columns c
        JOIN information_schema.tables t ON
          (t.table_name=c.table_name AND t.table_schema=c.table_schema)
        JOIN information_schema.table_privileges p ON
          (t.table_name=p.table_name AND t.table_schema=p.table_schema
              AND p.privilege_type='SELECT')
        JOIN information_schema.schemata s ON
          (s.schema_name=t.table_schema)
      WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
        AND (c.table_schema=ANY(haystack_schema) OR haystack_schema='{}')
        AND t.table_type='BASE TABLE'
  LOOP
    FOR rowctid IN
      EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
       schemaname,
       tablename,
       columnname,
       needle
      )
    LOOP
      -- uncomment next line to get some progress report
      -- RAISE NOTICE 'hit in %.%', schemaname, tablename;
      RETURN NEXT;
    END LOOP;
 END LOOP;
END;
$$ language plpgsql;

See also the version on github based on the same principle but adding some speed and reporting improvements.

Examples of use in a test database:

  • Search in all tables within public schema:
select * from search_columns('foobar');
 schemaname | tablename | columnname | rowctid 
------------+-----------+------------+---------
 public     | s3        | usename    | (0,11)
 public     | s2        | relname    | (7,29)
 public     | w         | body       | (0,2)
(3 rows)
  • Search in a specific table:
 select * from search_columns('foobar','{w}');
 schemaname | tablename | columnname | rowctid 
------------+-----------+------------+---------
 public     | w         | body       | (0,2)
(1 row)
  • Search in a subset of tables obtained from a select:
select * from search_columns('foobar', array(select table_name::name from information_schema.tables where table_name like 's%'), array['public']);
 schemaname | tablename | columnname | rowctid 
------------+-----------+------------+---------
 public     | s2        | relname    | (7,29)
 public     | s3        | usename    | (0,11)
(2 rows)
  • Get a result row with the corresponding base table and and ctid:
select * from public.w where ctid='(0,2)';
 title |  body  |         tsv         
-------+--------+---------------------
 toto  | foobar | 'foobar':2 'toto':1

Variants

  • To test against a regular expression instead of strict equality, like grep, this part of the query:

    SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L

    may be changed to:

    SELECT ctid FROM %I.%I WHERE cast(%I as text) ~ %L

  • For case insensitive comparisons, you could write:

    SELECT ctid FROM %I.%I WHERE lower(cast(%I as text)) = lower(%L)


to search every column of every table for a particular value

This does not define how to match exactly.
Nor does it define what to return exactly.

Assuming:

  • Find any row with any column containing the given value in its text representation - as opposed to equaling the given value.
  • Return the table name (regclass) and the tuple ID (ctid), because that's simplest.

Here is a dead simple, fast and slightly dirty way:

CREATE OR REPLACE FUNCTION search_whole_db(_like_pattern text)
  RETURNS TABLE(_tbl regclass, _ctid tid) AS
$func$
BEGIN
   FOR _tbl IN
      SELECT c.oid::regclass
      FROM   pg_class c
      JOIN   pg_namespace n ON n.oid = relnamespace
      WHERE  c.relkind = 'r'                           -- only tables
      AND    n.nspname !~ '^(pg_|information_schema)'  -- exclude system schemas
      ORDER BY n.nspname, c.relname
   LOOP
      RETURN QUERY EXECUTE format(
         'SELECT $1, ctid FROM %s t WHERE t::text ~~ %L'
       , _tbl, '%' || _like_pattern || '%')
      USING _tbl;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM search_whole_db('mypattern');

Provide the search pattern without enclosing %.

Why slightly dirty?

If separators and decorators for the row in text representation can be part of the search pattern, there can be false positives:

  • column separator: , by default
  • whole row is enclosed in parentheses:()
  • some values are enclosed in double quotes "
  • \ may be added as escape char

And the text representation of some columns may depend on local settings - but that ambiguity is inherent to the question, not to my solution.

Each qualifying row is returned once only, even when it matches multiple times (as opposed to other answers here).

This searches the whole DB except for system catalogs. Will typically take a long time to finish. You might want to restrict to certain schemas / tables (or even columns) like demonstrated in other answers. Or add notices and a progress indicator, also demonstrated in another answer.

The regclass object identifier type is represented as table name, schema-qualified where necessary to disambiguate according to the current search_path:

  • Find the referenced table name using table, field and schema name

What is the ctid?

  • How do I decompose ctid into page and row numbers?

You might want to escape characters with special meaning in the search pattern. See:

  • Escape function for regular expression or LIKE patterns