What is cardinality in Databases?
A source of confusion may be the use of the word in two different contexts - data modelling and database query optimization.
In data modelling terms, cardinality is how one table relates to another.
- 1-1 (one row in table A relates to one row in tableB)
- 1-Many (one row in table A relates to many rows in tableB)
- Many-Many (Many rows in table A relate to many rows in tableB)
There are also optional participation conditions to the above (where a row in one table doesn't have to relate to the other table at all).
See Wikipedia on Cardinality (data modelling).
When talking about database query optimization, cardinality refers to the data in a column of a table, specifically how many unique values are in it. This statistic helps with planning queries and optimizing the execution plans.
See Wikipedia on Cardinality (SQL statements).
It depends a bit on context. Cardinality means the number of something but it gets used in a variety of contexts.
- When you're building a data model, cardinality often refers to the number of rows in table A that relate to table B. That is, are there 1 row in B for every row in A (1:1), are there N rows in B for every row in A (1:N), are there M rows in B for every N rows in A (N:M), etc.
- When you are looking at things like whether it would be more efficient to use a b*-tree index or a bitmap index or how selective a predicate is, cardinality refers to the number of distinct values in a particular column. If you have a
PERSON
table, for example,GENDER
is likely to be a very low cardinality column (there are probably only two values inGENDER
) whilePERSON_ID
is likely to be a very high cardinality column (every row will have a different value). - When you are looking at query plans, cardinality refers to the number of rows that are expected to be returned from a particular operation.
There are probably other situations where people talk about cardinality using a different context and mean something else.