What's the point of a candidate key?
It means that if PhoneNumber was indeed a candidate key you could delete the ID column and use PhoneNumber instead. In other words, it is a candidate for being a unique key.
Wikipedia has a more formal definition that you many want to look at.
A key is called a candidate key, because while it could be used as a PK, it is not necessarily the PK.
There can be more than one candidate key for a given row, e.g., EmployeeID
and SSN
.
Often, rather than using a candidate key as the PK, a surrogate key is created instead. This is because decisions around what candidate key to use can be found to be erroneous later, which can cause a huge headache (literally).
Another reason is that a surrogate key can be created using an efficient data type for indexing purposes, which the candidate keys may not have (e.g., a UserImage
).
A third reason is that many ORMs work only with a single-column PK, so candidate keys composed of more than one column (composite keys) are ruled out in that case.
Something that many developers do not realize is that selecting a surrogate key over a natural key may be a compromise in terms of data integrity. You may be losing some constraints on your data by selecting a surrogate key, and often a trigger is required to simulate the constraint if a surrogate key is chosen.
Candidate key is a concept that appears when you are designing a database system.
Suppose your system will have a table named User, defined as below:
User (fullName, socialSecurityNumber, creditCardNumber, age).
Well, you have to choose which subset of these columns will be your primary key. The goal when designing the database is, of course, to keep this set minimal. You wouldn't use the pair (SSN, creditCardNumber) if the SSN alone already guarantees uniqueness.
Now, suppose that fullName, SSN and creditCardNumber are all fields that you know, somehow,
that are unique for all users. You could use any of those as your PK, so they are
all candidate keys (whereas age, on the other hand, is not).
Which will you choose?
That will depend on factors such as the datatype of the field (it's preferable to
set an index on an integer column rather than on an varchar column, for example).