Configuring PostgreSQL for read performance
Data alignment and storage size
Actually, the overhead per index tuple is 8 byte for the tuple header plus 4 byte for the item identifier.
Related:
- Use GIN to index bit strings
- Calculating and saving space in PostgreSQL
We have three columns for the primary key:
PRIMARY KEY ("Timestamp" , "TimestampIndex" , "KeyTag")
"Timestamp" timestamp (8 bytes)
"TimestampIndex" smallint (2 bytes)
"KeyTag" integer (4 bytes)
Results in:
4 bytes for item identifier in the page header (not counting towards multiple of 8 bytes) 8 bytes for the index tuple header 8 bytes "Timestamp" 2 bytes "TimestampIndex" 2 bytes padding for data alignment 4 bytes "KeyTag" 0 padding to the nearest multiple of 8 bytes ----- 28 bytes per index tuple; plus some bytes of overhead.
About measuring object size in this related answer:
- Measure the size of a PostgreSQL table row
Order of columns in a multicolumn index
Read these two questions and answers to understand:
- Is a composite index also good for queries on the first field?
- Working of indexes in PostgreSQL
The way you have your index (primary key), you can retrieve rows without a sorting step, that's appealing, especially with LIMIT
. But retrieving the rows seems extremely expensive.
Generally, in a multi-column index, "equality" columns should go first and "range" columns last:
- Multicolumn index and performance
Therefore, try an additional index with reversed column order:
CREATE INDEX analogransition_mult_idx1
ON "AnalogTransition" ("KeyTag", "TimestampIndex", "Timestamp");
It depends on data distribution. But with millions of row, even billion of rows
this might be substantially faster.
Tuple size is 8 bytes bigger, due to data alignment & padding. If you are using this as plain index, you might try to drop the third column "Timestamp"
. May be a bit faster or not (since it might help with sorting).
You might want to keep both indexes. Depending on a number of factors, your original index may be preferable - in particular with a small LIMIT
.
autovacuum and table statistics
Your table statistics need to be up to date. I am sure you have autovacuum running.
Since your table seems to be huge and statistics important for the right query plan, I would substantially increase the statistics target for relevant columns:
ALTER TABLE "AnalogTransition" ALTER "Timestamp" SET STATISTICS 1000;
... or even higher with billions of rows. Maximum is 10000, default is 100.
Do that for all columns involved in WHERE
or ORDER BY
clauses. Then run ANALYZE
.
Table layout
While being at it, if you apply what you have learned about data alignment and padding, this optimized table layout should save some disk space and help performance a little (ignoring pk & fk):
CREATE TABLE "AnalogTransition"(
"Timestamp" timestamp with time zone NOT NULL,
"KeyTag" integer NOT NULL,
"TimestampIndex" smallint NOT NULL,
"TimestampQuality" smallint,
"UpdateTimestamp" timestamp without time zone, -- (UTC)
"QualityFlags" smallint,
"Quality" boolean,
"Value" numeric
);
CLUSTER
/ pg_repack / pg_squeeze
To optimize read performance for queries that use a certain index (be it your original one or my suggested alternative), you can rewrite the table in the physical order of the index. CLUSTER
does that, but it's rather invasive and requires an exclusive lock for the duration of the operation.
pg_repack
is a more sophisticated alternative that can do the same without exclusive lock on the table.
pg_squeeze
is a later, similar tool (have not used it, yet).
This can help substantially with huge tables, since much fewer blocks of the table have to be read.
RAM
Generally, 2GB of physical RAM is just not enough to deal with billions of rows quickly. More RAM might go a long way - accompanied by adapted setting: obviously a bigger effective_cache_size
to begin with.
So, from the plans I see one thing: you index is either bloated (then alongside with the underlying table) or simply isn't really good for this sort of query (I tried to address this in my latest comment above).
One row of the index contains 14 bytes of data (and some for the header). Now, calculating from the numbers given in the plan: you got 500,000 rows from 190147 pages - that means, on average, less than 3 useful rows per page, that is, around 37 bytes per a 8 kb page. This is a very bad ratio, isn't it? Since the first column of the index is the Timestamp
field and it is used in the query as a range, the planner can - and does - choose the index to find matching rows. But there is no TimestampIndex
mentioned in the WHERE
conditions, so filtering on KeyTag
isn't very effective as those values supposedly appear randomly in the index pages.
So, one possibility is changing the index definition to
CONSTRAINT "PK_AnalogTransition" PRIMARY KEY ("Timestamp", "KeyTag", "TimestampIndex")
(or, given the load of your system, create this index as a new one:
CREATE INDEX CONCURRENTLY "idx_AnalogTransition"
ON "AnalogTransition" ("Timestamp", "KeyTag", "TimestampIndex");
- this will take a while for sure but you can still work in the meantime.)
The other possibility that a big proportion of the index pages is occupied by dead rows, which could be removed by vacuuming. You created the table with setting autovacuum_enabled=true
- but have you ever started autovacuuming? Or run VACUUM
manually?