Should I manually VACUUM my PostgreSQL database if autovacuum is turned on?
I agree with ETL that there is no short answer. Size is not the only thing that matters - we run quite large PostgreSQL OLTP Databases (with some tables > 100.000.000 rows) under heavy load and currently we rely on autovacuum only.
Yet, two things seem important to me:
There seems to be a consensus, that autovacuum should never be switched off, unless you have a very well defined workload on your database and you know exactly what you are doing. But, naturally, you could do additional
VACUUM
and/orANALYZE
runs.Before considering additional
VACUUM
runs, I would check how autovacuum keeps up. You can check whether any tables are beyond the autovacuum threshold by queryingpg_stat_user_tables
andpg_class
. I posted such a query on another thread, that might be of interest: Aggressive Autovacuum on PostgreSQL.Unfortunately, it is not as easy (i.e. not possible at the moment) to do a similar check for autoanalyze thresholds. However, autoanalyze kicks in long before autovacuum by default and it is much cheaper. So, basically if your database can keep up with autovacuum, it will probably be fine with autoanalyze too. The last autoanalyze dates can also be queried from
pg_stat_user_tables
.
Some parts of the (most excellent) PostgreSQL documentation, that I found helpful:
- Routine Vacuuming
- The Statistics Collector
Autovacuum should pretty well cover it, unless you mis-configured something. Other answers cover that already.
There is one clearly defined case for manual VACUUM
(and more importantly: manual ANALYZE
) though: temporary tables, they are not considered by the autovacuum demon. I quote the manual on CREATE TABLE
here:
The autovacuum daemon cannot access and therefore cannot vacuum or analyze temporary tables. For this reason, appropriate vacuum and analyze operations should be performed via session SQL commands. For example, if a temporary table is going to be used in complex queries, it is wise to run
ANALYZE
on the temporary table after it is populated.
There is no short answer to that as it depends on a lot of factor. Is the system slow? Is the auto-vacuum actually touching this table? etc.
Here are some good links on this subject:
- PostgreSQL performance considerations
- Performance Optimization (PostgreSQL wiki)
- Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT (same source)
- Interpreting pg_stat Views, Understanding pg_stat Views
To make a clear decision requires an understanding of the database itself and more details on what's going on.