Do nullable columns occupy additional space in PostgreSQL?
I believe each would use a single bit in the bitmap for the row. See here: http://www.postgresql.org/docs/9.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE
Basically, NULL
values occupy 1 bit in the NULL bitmap. But it's not that simple.
The null bitmap (per row) is only allocated if at least one column in that row holds a NULL
value. This can lead to a seemingly paradoxic effect in tables with 9 or more columns: assigning the first NULL
value to a column can take up more space on disk than writing a value to it. Conversely, removing the last NULL value from the row also removes the NULL bitmap.
Physically, the initial null bitmap occupies 1 byte between the HeapTupleHeader
(23 bytes) and actual column data or the row OID
(if you should still be using that) - which always start at a multiple of MAXALIGN
(typically 8 bytes). This leaves 1 byte of padding that is utilized by the initial null bitmap.
In effect, NULL storage is absolutely free for tables of 8 columns or less (including dropped, but not yet purged columns).
After that, another MAXALIGN
bytes (typically 8) are allocated for the next MAXALIGN * 8
columns (typically 64). Etc.
More details in the manual and under these related questions:
- How much disk-space is needed to store a NULL value using postgresql DB?
- Does not using NULL in PostgreSQL still use a NULL bitmap in the header?
- How many records can I store in 5 MB of PostgreSQL on Heroku?
Once you understand alignment padding of data types, you can further optimize storage:
- Calculating and saving space in PostgreSQL
But the cases are rare where you can save substantial amounts of space. Normally it's not worth the effort.
@Daniel already covers effects on index size.
Note that dropped columns (though now invisible) are kept in the system catalogs until the table is recreated. Those zombis can force the allocation of an (enlarged) NULL bitmap. See:
- Dropping column in Postgres on a large dataset
Whether NULL
values get to the index or not depend at least on the type of the index.
Basically, this would be YES for btree
and gist
index types, NO for hash
, and it seems YES or NO for gin
index types depending on PostgreSQL version.
There used to be a boolean column amindexnulls
in the pg_catalog.pg_am
table that carried that information, but it's gone in 9.1. Probably because indexes have got even more sophisticated among PG improvements.
In the specific case of your data, the best way to know would be to measure the size difference of indexes, using the pg_relation_size('index_name')
function, between contents entirely NULL and entirely NOT NULL, with your exact PG version, exact datatype, exact index type and definition. And know that possibly, a future change in any of these parameters may change the outcome.
But in any case, if you "just" want to avoid indexing NULLs, it's always possible to create a partial index:
CREATE INDEX partial_idx(col) ON table WHERE (col is not null)
This is going to take less space, but whether this is going to help or not with the performance of queries depend on these queries.