Keep PostgreSQL from sometimes choosing a bad query plan
If the query planner makes bad decisions it's mostly one of two things:
1. The statistics are inaccurate.
Do you run ANALYZE
enough? Also popular in it's combined form VACUUM ANALYZE
. If autovacuum is on (which is the default in modern-day Postgres), ANALYZE
is run automatically. But consider:
- Are regular VACUUM ANALYZE still recommended under 9.1?
(Top two answers still apply for Postgres 12.)
If your table is big and data distribution is irregular, raising the default_statistics_target
may help. Or rather, just set the statistics target for relevant columns (those in WHERE
or JOIN
clauses of your queries, basically):
ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 400; -- calibrate number
The target can be set in the range 0 to 10000;
Run ANALYZE
again after that (on relevant tables).
2. The cost settings for planner estimates are off.
Read the chapter Planner Cost Constants in the manual.
Look at the chapters default_statistics_target and random_page_cost on this generally helpful PostgreSQL Wiki page.
There are many other possible reasons, but these are the most common ones by far.
I'm skeptical that this has anything to do with bad statistics unless you consider the combination of database statistics and your custom data type.
My guess is that PostgreSQL is picking a nested loop join because it looks at the predicates (treenode.location).x >= 8000 AND (treenode.location).x <= (8000 + 4736)
and does something funky in the arithmetic of your comparison. A nested loop is typically going to be used when you have a small amount of data in the inner side of the join.
But, once you switch the constant to 10736 you get a different plan. It's always possible that the plan is of sufficiently complexity that the Genetic Query Optimization (GEQO) is kicking in and you're seeing the side effects of non-deterministic plan building. There are enough discrepancies in the order of evaluation in the queries to make me think that's what's going on.
One option would be to examine using a parameterized/prepared statement for this instead of using ad hoc code. Since you're working in a 3-dimensional space, you might also want to considering using PostGIS. While it might be overkill, it may also be able to provide you with the performance that you need to get these queries running properly.
While forcing planner behavior isn't the best choice, sometimes we do end up making better decisions than the software.