Selecting top 10 from indexed field of a big table takes too long

In your sample code above, the index is explicitly created as NULLS LAST and the query is implicitly running NULLS FIRST (which is the default for ORDER BY .. DESC) so PostgreSQL would need to re-sort the data if it used the index. As a result the index would actually make the query many times slower than even the (already-slow) table scan.

Click to copy

rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms

rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( >   select
[more] ( >     i id,
[more] ( >     (random()*1000)::integer installs,
[more] ( >     (now() - make_interval(secs => i))::timestamp installs
[more] ( >   from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms

rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 ms

We can add a WHERE clause on the hour column so that using the index becomes a good idea - but notice how we still need to re-sort the data from the index.

Click to copy

rds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Limit  (cost=4.45..4.46 rows=1 width=8)
   ->  Sort  (cost=4.45..4.46 rows=1 width=8)
         Sort Key: hour DESC
         ->  Index Only Scan using hour_idx on performance  (cost=0.42..4.44 rows=1 width=8)
               Index Cond: (hour > now())
(5 rows)

Time: 0.789 ms

If we add an explicit NULLS LAST to your query then it will use the index as expected.

Click to copy

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.526 ms

Alternatively, if we drop the (non-default) NULLS LAST from your index then the query will use it as expected without modification.

Click to copy

rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.725 ms

Note that you can also drop the DESC from your index; PostgreSQL can scan indexes both forwards and backwards and on single-column indexes it's generally unnecessary to reverse them. You only need to be careful about having the right combination of order and nulls first/last.

Click to copy

rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan Backward using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.740 ms

If most of your queries intend to select non-NULL values from hour then you should consider building a partial index on those values, i.e. something like:

Click to copy

CREATE INDEX hour_not_null_idx ON performance (hour)
 WHERE hour IS NOT NULL;

which, as long as you either query for a particular value of hour, as Jeremy demonstrated in his answer, or add hour IS NOT NULL to your WHERE clause, will give you the same results, and possibly save you a little space as well:

Click to copy

# explain select hour from performance where hour > now() order by hour desc limit 10;
 Limit  (cost=0.42..5.30 rows=10 width=8)
   ->  Index Only Scan Backward using hour_not_null_idx on performance  (cost=0.42..8.72 rows=17 width=8)
         Index Cond: (hour > now())

If there are no NULL values in the column, you should declare it NOT NULL (I'm going to assume you know how to do this with ALTER TABLE ;o)), then create the index (without NULLS LAST, since it's no longer important anyway). Then you get the same benefit:

Click to copy

william=# create index hour_idx on performance using btree ( hour );
CREATE INDEX
william=# explain select hour from performance order by hour desc limit 10;
                                           QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.73 rows=10 width=8)
   ->  Index Only Scan Backward using hour_idx on performance  (cost=0.42..5238.37 rows=166530 width=8)
(2 rows)

Selecting top 10 from indexed field of a big table takes too long

Tags:

Performance

Postgresql

Explain

Index

Related

Recent Posts