Are regular VACUUM ANALYZE still recommended under 9.1?
VACUUM is only needed on updated or deleted rows in non-temporary tables. Obviously you're doing lots of INSERTs but it's not obvious from the description that you're also doing lots of UPDATEs or DELETEs.
These operations can be tracked with the pg_stat_all_tables
view, specifically the n_tup_upd
and n_tup_del
columns. Also, even more to the point, there is a n_dead_tup
column that tells, per table, how much rows need to be vacuumed.
(see Monitoring statistics in the doc for functions and views related to statistics gathering).
A possible strategy in your case would be to suppress the scheduled VACUUM, keeping an eye on this view and checking on which tables the n_dead_tup
is going up significantly. Then apply the aggressive VACUUM to these tables only. This will be a win if there are large tables whose rows never get deleted nor updated and the aggressive VACUUM is really necessary only on smaller tables.
But keep running the ANALYZE for the optimizer to always have fresh statistics.
I see nothing in your question that autovacuum
would not take care of. It largely depends on the pattern of your writing activities. You mention 3 million new rows per week, but INSERT
(or COPY
) typically don't create table and index bloat. (autovacuum
only has to take care of column statistics, the visibility map and some minor jobs). UPDATE
and DELETE
are the dominant cause of table and index bloat, especially when targeting random rows. I don't see any of that in your question.
autovacuum
has come a long way and is doing a great job in Postgres 9.1 or later. I would have a look at the autovacuum
settings. If vacuuming tends to interfere with your work load, also have a look at "Cost-based Vacuum Delay". Manual vacuuming should be the rare exception.
If you have lots of random UPDATE
s, you may want to set the FILLFACTOR
to something lower than 100, to allow HOT updates right away and reduce the need for VACUUM
. More on HOT updates:
- Redundant data in update statements
Note also, that temporary tables need manual VACUUM
& ANALYZE
. I quote the manual on CREATE TABLE
:
The autovacuum daemon cannot access and therefore cannot vacuum or analyze temporary tables. For this reason, appropriate vacuum and analyze operations should be performed via session SQL commands. For example, if a temporary table is going to be used in complex queries, it is wise to run
ANALYZE
on the temporary table after it is populated.
While I agree that using the auto features is best instead of running it database wide, in most cases per table tuning is necessary.
I don't quite agree with the design choice of postgres to tie together vacuum and analyze, I have seen several instances where databases that do a lot of insert/update but little delete never get analyze done and start to perform badly.
The solution is to go into the tables that get used heavily and are subject to large queries and set the auto analyze settings for those tables down to something to where they are getting analyzed once or every other day.
You can get to the per table settings in the gui on the auto vacuum tab and you will see analyze settings there which you can set independently of vacuum.
The settings end up in the reloptions table and can be seen with the query
SELECT c.relname, c.reloptions FROM pg_class c where reloptions is not null
and a sample value there of an agressive analyze might be
{autovacuum_enabled=true,autovacuum_analyze_threshold=10,autovacuum_analyze_scale_factor=.01}
To see when the last time your tables got auto analyzed query
select
relname,
n_dead_tup,
n_tup_ins,
n_tup_upd,
n_tup_del,
last_autoanalyze,
autoanalyze_count
from pg_stat_user_tables
where last_autoanalyze is not null
order by last_autoanalyze desc;