What is the definition of cardinality in SQL

They are speaking the same thing and it has to do with tuples (relational algebra) or rows (layman's term).

When it says high-cardinality are possible values of particular attribute (or field) that are unique and therefore the number of rows or tuples are higher:

Example:

 StudentID   Lastname Firstname  Gender
 101         Smith    John       M
 102         Jones    James      M
 103         Mayo     Ann        F
 104         Jones    George     M
 105         Smith    Suse       F

As far as as StudentID the cardinality is high because it is unique. In this it has five (5) tuples/rows.

On the other hand Lastname has normal cardinality, in particular there are only three (3) unique tuples/rows. Thus it has normal cardinality.

And finally Gender has only two possible unique tuples thus Low Cardinality.

You probably confuse Cardinality here with Degree of a relation which has something to do of the number of attributes/fields in a relation (or table).

On the other hand the textbook for Database when speaking of Cardinality normally has to do with an entity in relation to another entity, that is, the number of possible relations occurences for an entity participating in a given relationship type. Thus for example for a binary relationship cardinality could be either one-to-one, one-to-many or many-to-many.


There are two concepts

  1. Index cardinality
  2. Cardinality

This I believe is referring to index cardinality, which is VERY different https://www.ibm.com/developerworks/data/library/techarticle/dm-1309cardinal/

  • Index cardinality is considered the number of unique values in the index
  • The term is used to discuss creating indexes, table scans, index access vs table access, how it affects inserts, updates, deletes, storage space

Here is another example, https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)

  • In SQL (Structured Query Language), the term cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table.
  • The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row. SQL databases use cardinality to help determine the optimal query plan for a given query.

Just the word cardinality, I believe focuses on relationships between tables In particular, it is not a term used to discuss a single table or uniqueness of data

IBM documentation (if you search for the word unique, it is not mentioned) https://www.ibm.com/support/knowledgecenter/en/SSEP7J_10.2.2/com.ibm.swg.ba.cognos.ug_cog_rlp.10.2.2.doc/c_cog_rlp_rel_cardinality.html When you interpret cardinality, you must consider the notation that displays at both ends of the relationship. Possible end labels are shown in the following list:

  • 0..1 (zero or one match)
  • 1..1 (exactly one match)
  • 0..n (zero or more matches)
  • 1..n (one or more matches)

In mathematics, the cardinality of a set is a measure of the "number of elements of the set". (no mention of unique btw) https://en.wikipedia.org/wiki/Cardinality

In database design, the cardinality or fundamental principle of one data aspect with respect to another is a critical feature. The relationship of one to the other must be precise and exact between each other in order to explain how each aspect links together. In the relational model, tables can be related as any of "one-to-many", "many-to-many" "one-to-zero-or-one", etc.. This is said to be the cardinality of a given table in relation to another. https://en.wikipedia.org/wiki/Cardinality_(data_modeling)


Both definitions are trying to say that cardinality is the "number of rows". The difference is whether the comparison is "in the table" or "in a particular column".

The version in your database text book focuses on relational algebra and the structure of tables ("relations" in that lingo).

The Wikipedia entry is more practical. It encompasses the textbook definition, assuming the table has a primary key (the cardinality of the primary key is the same as the table). However, it can be applied to, say, a flag column as well. If the flag only takes on two values (0 versus 1), then we can say that the cardinality of the column is 2.

This is important for optimizing queries. Cardinality is one component of choosing the best methods for joining, aggregating, and selecting data. In practice, most databases use more information than the cardinality, so-called "statistics" about columns and their values for optimization.