Shall I use enum when are too many "categories" with PostgreSQL?
With an average of 2.4 characters (more relevant: avg bytes - but that's the same for all ASCII characters) I would not bother to use enums. Those occupy 4 bytes on disk plus, possibly, alignment padding. (text
does not require alignment padding.) You are not even saving storage and get more overhead for it.
With most values below 7 characters (= 8 bytes on disk), an index on a text
category column will also be only slightly bigger than one on an enum
. (Space for data is (typically) allocated in multiples of 8 bytes.)
For a fixed number of 208 categories, a "char"
encoding (not to be confused with char
!) might be an option to save storage. See:
- How to store one-byte integer in PostgreSQL?
But, again, not worth the trouble for such small strings. Just use text
. Maybe enforce correctness with a FK constraint to a category
table like:
CREATE TABLE category (category text PRIMARY KEY);
Also a good place to store additional information per category. And you can easily modify the set of categories. Make the FK constraint ON UPDATE CASCADE
and you can change category names in one central place. Make it ON DELETE SET NULL
, and you can easily remove a category. Etc.
Related:
- What is the overhead for varchar(n)?
- Calculating and saving space in PostgreSQL
I fully support Erwin's answer, but I wanted to add a warning against enums.
Enums are a good choice if you have a fixed number of possible values that can never change (at least there must be a guarantee that no values would have to be removed).
In all other cases, you should not use enums: It is impossible to remove an enum value once you have added it.
For example, when choosing a data type for a column that contains a US state, I would not choose an enum — unlikely as it is, it could be that a state secedes, or that two states unite.
Based on how you describe the data, I would not recommend enums in your case.