Should every table have a single-field surrogate/artificial primary key?
I'm going to say no, not always, but most of the time yes..
These are some circumstances in which you don't need a surrogate or artificial key:
- Pure intersection tables. If there is no risk of the intersection being the target of a foreign key and if there is little or no risk of the intersection attracting independent attributes (i.e. something other than FK's to the two parent tables) then you can get away with using the combination of FKs as the PK with fair confidence.
- Lookup tables with static business keys. If you have a lookup
table with a unique business key which is fixed externally to your
business and which has zero chance of ever changing for any
practical purpose, then using the business key directly can make
things simpler. An example might be a list of state or province
codes or a list of ANSI standard numbers, etc. - Tables containing data consolidated from multiple, independent sources. If your system has many sources of data that must be shoehorned together into a single table, say at head office, then sometimes you need a compound key that includes the source system key value and a code indicating what the source system was.
There are also some situations where the old-faithful monotonically increasing integer surrogate key is not ideal. You can have keys that are alphanumeric surrogates. These could include:
- Situations where you need to merge data from multiple, independent sources. To avoid key collisions you might use GUIDs instead of IDENTITY keys.
- Situations where you are forced to use non-numeric key representations. Let's say you've got a license plate database. Your key could be the alphanumeric value instead of a pure number.
- Situations where some external requirement forces you to apply compression to your key value. Instead of using 10 digits for an int32 you can use six base 36 digits.
Why most of the time yes? The most fundamental answer to that question is that it is pure hell if you ever need to modify a primary key value on any table. Since almost anything a user can see or touch is conceivably subject to an update at some point, using a visible key value is inviting pure hell. Using a surrogate key will keep you from falling into this trap.
Having said that, remember that there is room for YAGNI in applying this concept. You don't need to go forcing code tables with IDENTITY keys into every nook and cranny of your schema, just in case someone decides that the symbol for male gender in your employee table needs to change from M to X or something silly.
"it depends"
Yes: Surrogate IDENTITY/AUTONUMBER fields are good when the natural key is wide and non-numeric. Note: this assumes the conflation of "PK" and clustered index that occurs by default in SQL Server and Sybase etc
No: many/many tables when the 2 parent keys suffice. Or when the natural key is short and fixed length eg currency code
Of course, a braindead ORM (read: (n)Hibernate) may trump these rules...
Edit: reading question again
A many/many table with 2 surrogate parent keys will have a multiple column PK.
However, it doesn't need another surrogate column.
If a table does have a surrogate (IDENTITY etc) key then it doesn't need to be multiple column.
You can have a super key that includes the surrogate but this would be to enforce other rules (eg subtypes)
No.
I'd say there are certainly cases when single-field keys are inferior to compound keys, at least for the purpose of foreign keys. That is not to say you should not have a single-field surrogate key as well if you prefer, but I personally prefer the key that is most often used as the target of a foreign key to be called the primary key
I'll attempt to illustrate my point in the the following examples, in which:
brand
is car marque, eg Ford, Toyota etcdealer
is a physical dealership, tied to a brand (eg a Ford dealership only selling Fords)model
is the type of car eg Ford Focus, Ford Fiesta etcstock
is the current forecourt car count for each dealership
If we create a single-field surrogate key for dealer
and model
as follows:
create table brand( brand_id integer primary key );
create table dealer( dealer_id integer primary key,
brand_id integer references brand )
create table model( model_id integer primary key,
brand_id integer references brand )
create table stock( model_id integer references model,
dealer_id integer references dealer,
quantity integer,
primary key(model_id, dealer_id) )
then it is possible to insert a row into stock
that links a Ford dealer
to a "Toyota" model. Adding brand_id references brand
to stock
only makes the problem worse. On the other hand if we keep the foreign key as part of the primary key as follows:
create table brand( brand_id integer primary key );
create table dealer( brand_id integer references brand,
dealer_id integer,
primary key(brand_id, dealer_id) )
create table model( brand_id integer references brand,
model_id integer,
primary key(brand_id, model_id) )
create table stock( brand_id integer,
model_id integer,
dealer_id integer,
quantity integer,
primary key(brand_id, model_id, dealer_id),
foreign key(brand_id, model_id) references model,
foreign key(brand_id, dealer_id) references dealer )
now the rule that "Ford" dealers can only stock "Ford" cars is enforced naturally by the model.
Note that in the 'composite keys' example, dealer_id
may or may not be unique, according to preference. It does not need to be unique (ie an alternate key), but very little is lost by making it so (perhaps a little storage space) and it can be very handy so that is the way I usually set it up, eg:
create table dealer( brand_id integer references brand,
dealer_id serial unique,
primary key(brand_id, dealer_id) )