How do I know what indexes to create for a table?
Short rules of thumb. (Some of these are created automatically, but can possibly be manually dropped later, depending on your dbms. Don't assume you will always work on PostgreSQL.)
- Index every primary key.
- Index every foreign key.
- Index every column used in a JOIN clause.
- Index every column used in a WHERE clause.
- Study your documentation to learn the "esoteric" indexing options your dbms supports.
Every primary key means that multi-column primary keys should have a single index covering all columns. PostgreSQL will create this index automatically if you declare a multi-column primary key.
There are many cases in which a single multi-column index gives you better performance than several single-column indexes. Monitor slow queries and do testing to figure out which is which.
Assume that any change to indexing will improve some database activities and degrade others. I find it helpful to have a set of SQL statements that I can profile before and after making changes to indexes. This set includes SELECT, INSERT, UPDATE, and DELETE statements.
There's no substitute for studying the docs for your particular dbms.
- CREATE INDEX
- Indexes (Note especially the sections on indexing expressions, on partial indexes, and on examining index usage)
In addition to what @Catcall already provided, and to add a small corrective:
I also covered some basics in this closely related answer on SO recently.
Answers so far seem to indicate you need to create indexes on primary keys, but that's not the case in PostgreSQL (partial exceptions apply). I quote the manual here:
PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table. The index covers the columns that make up the primary key or unique constraint (a multicolumn index, if appropriate), and is the mechanism that enforces the constraint.
Bold emphasis mine.
You may want to create additional indexes for the second or later columns of a multi-column index, but the first one is generally covered just fine by a multicolumn index - except when additional columns make the index much larger. We discussed that in great detail under this related question:
Is a composite index also good for queries on the first field?
Multicolumn indexes, partial indexes and indexes on expressions are particularly powerful tools in PostgreSQL. Since PostgreSQL 9.2 there are also index-only scans, the equivalent of "covering indexes" in other RDBMS. This isn't another type of index, but a new capability of the RDBMS with existing index types.
Every index carries specific costs, so there is no way around some basic knowledge to really optimize indexing. Just creating more indexes can do more harm than good. In particular, indexes can prevent HOT updates from improving performance.
Generally, write operations (DELETE
, UPDATE
) become more expensive (but may also benefit!), while read operations (SELECT
) generally just benefit. Too many indexes can exhaust the cache memory so that even read operations can suffer.
Finally, this Postgres Wiki page on index maintenance features tools to find duplicate or unused indexes (among other things).