quick random row selection in Postgres
PostgreSQL 9.5 introduced a new approach for much faster sample selection: TABLESAMPLE
The syntax is
SELECT * FROM my_table TABLESAMPLE BERNOULLI(percentage);
SELECT * FROM my_table TABLESAMPLE SYSTEM(percentage);
This is not the optimal solution if you want only one row selected, because you need to know the COUNT of the table to calculate the exact percentage.
To avoid a slow COUNT and use fast TABLESAMPLE for tables from 1 row to billions of rows, you can do:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.000001) LIMIT 1;
-- if you got no result:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.00001) LIMIT 1;
-- if you got no result:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.0001) LIMIT 1;
-- if you got no result:
SELECT * FROM my_table TABLESAMPLE SYSTEM(0.001) LIMIT 1;
...
This might not look so elegant, but probably is faster than any of the other answers.
To decide whether you want to use BERNULLI oder SYSTEM, read about the difference at http://blog.2ndquadrant.com/tablesample-in-postgresql-9-5-2/
You might want to experiment with OFFSET
, as in
SELECT myid FROM mytable OFFSET floor(random() * N) LIMIT 1;
The N
is the number of rows in mytable
. You may need to first do a SELECT COUNT(*)
to figure out the value of N
.
Update (by Antony Hatchkins)
You must use floor
here:
SELECT myid FROM mytable OFFSET floor(random() * N) LIMIT 1;
Consider a table of 2 rows; random()*N
generates 0 <= x < 2
and for example SELECT myid FROM mytable OFFSET 1.7 LIMIT 1;
returns 0 rows because of implicit rounding to nearest int.