Don't know how to transform variable entity into relational table
The proper structure for this scenario is a SubClass / Inheritance model, and is nearly identical to the concept I proposed in this answer: Heterogeneous ordered list of value.
The model proposed in this question is actually quite close in that the Animal
entity contains the type (i.e. race
) and the properties that are common across all types. However, there are two minor changes that are needed:
Remove the Cat_ID and Dog_ID fields from their respective entities:
The key concept here is that everything is an
Animal
, regardless ofrace
:Cat
,Dog
,Elephant
, and so on. Given that starting point, any particularrace
ofAnimal
doesn't truly need a separate identifier since:- the
Animal_ID
is unique - the
Cat
,Dog
, and any additionalrace
entities added in the future do not, by themselves, fully represent any particularAnimal
; they only have meaning when used in combination with the information contained in the parent entity,Animal
.
Hence, the
Animal_ID
property in theCat
,Dog
, etc entities is both the PK and the FK back to theAnimal
entity.- the
Differentiate between types of
breed
:Just because two properties share the same name does not necessarily mean that those properties are the same, even if the name being the same implies such a relationship. In this case, what you really have is actually
CatBreed
andDogBreed
as seperate "types"
Initial Notes
- The SQL is specific to Microsoft SQL Server (i.e. is T-SQL). Meaning, be careful about datatypes as they are not the same across all RDBMS's. For example, I am using
VARCHAR
but if you need to store anything outside of the standard ASCII set, you should really useNVARCHAR
. - The ID fields of the "type" tables (
Race
,CatBreed
, andDogBreed
) are not auto-incrementing (i.e. IDENTITY in terms of T-SQL) because they are application constants (i.e. they are part of the application) that are static lookup values in the database and are represented asenum
s in C# (or other languages). If values are added, they are added in controlled situations. I reserve the use of auto-increment fields for user data that comes in via the application. - The naming convention I use is to name each subclass table starting with the main class name followed by the subclass name. This helps organize the tables as well as indicates clearly (without looking at the FKs) the relationship of the subclass table to the main entity table.
- Please see "Final Edit" section at the end for a note regarding Views.
"Breed" as "Race"-Specific Approach
This first set of tables are the lookup / types tables:
CREATE TABLE Race
(
RaceID INT NOT NULL PRIMARY KEY
RaceName VARCHAR(50) NOT NULL
);
CREATE TABLE CatBreed
(
CatBreedID INT NOT NULL PRIMARY KEY,
BreedName VARCHAR(50),
CatBreedAttribute1 INT,
CatBreedAttribute2 VARCHAR(10)
-- other "CatBreed"-specific properties as needed
);
CREATE TABLE DogBreed
(
DogBreedID INT NOT NULL PRIMARY KEY,
BreedName VARCHAR(50),
DogBreedAttribute1 TINYINT
-- other "DogBreed"-specific properties as needed
);
This second listing is the main "Animal" entity:
CREATE TABLE Animal
(
AnimalID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
RaceID INT NOT NULL, -- FK to Race
Name VARCHAR(50)
-- other "Animal" properties that are shared across "Race" types
);
ALTER TABLE Animal
ADD CONSTRAINT [FK_Animal_Race]
FOREIGN KEY (RaceID)
REFERENCES Race (RaceID);
This third set of tables are the complimentary sub-class entities that complete the definition of each Race
of Animal
:
CREATE TABLE AnimalCat
(
AnimalID INT NOT NULL PRIMARY KEY, -- FK to Animal
CatBreedID INT NOT NULL, -- FK to CatBreed
HairColor VARCHAR(50) NOT NULL
-- other "Cat"-specific properties as needed
);
ALTER TABLE AnimalCat
ADD CONSTRAINT [FK_AnimalCat_CatBreed]
FOREIGN KEY (CatBreedID)
REFERENCES CatBreed (CatBreedID);
ALTER TABLE AnimalCat
ADD CONSTRAINT [FK_AnimalCat_Animal]
FOREIGN KEY (AnimalID)
REFERENCES Animal (AnimalID);
CREATE TABLE AnimalDog
(
AnimalID INT NOT NULL PRIMARY KEY, -- FK to Animal
DogBreedID INT NOT NULL, -- FK to DogBreed
HairColor VARCHAR(50) NOT NULL
-- other "Dog"-specific properties as needed
);
ALTER TABLE AnimalDog
ADD CONSTRAINT [FK_AnimalDog_DogBreed]
FOREIGN KEY (DogBreedID)
REFERENCES DogBreed (DogBreedID);
ALTER TABLE AnimalDog
ADD CONSTRAINT [FK_AnimalDog_Animal]
FOREIGN KEY (AnimalID)
REFERENCES Animal (AnimalID);
The model using a shared breed
type is shown after the "Additional Notes" section.
Additional Notes
- The concept of
breed
seems to be a focal point for confusion. It was suggested by jcolebrand (in a comment on the question) thatbreed
is a property shared across the differentrace
s, and the other two answers have it integrated as such in their models. This is a mistake, however, because the values forbreed
are not shared across the different values ofrace
. Yes, I am aware that the two other proposed models attempt to solve this issue by makingrace
a parent ofbreed
. While that technically solves the relationship issue, it doesn't help solve the overall modeling question of what to do about non-common properties, nor how to handle arace
that does not have abreed
. But, in the case that such a property were guaranteed to exist across allAnimal
s, I will include an option for that as well (below). - The models proposed by vijayp and DavidN (which seem to be identical) do not work because:
- They either
- do not allow for non-common properties to be stored (at least not for individual instances of any
Animal
), or - require that all properties for all
race
s be stored in theAnimal
entity which is a very flat (and nearly non-relational) way of representing this data. Yes, people do this all of the time, but it means having many NULL fields per row for the properties that are not meant for that particularrace
AND knowing which fields per row are associated with the particularrace
of that record.
- do not allow for non-common properties to be stored (at least not for individual instances of any
- They do not allow for adding a
race
ofAnimal
in the future that does not havebreed
as a property. And even if ALLAnimal
s have abreed
, that wouldn't change the structure due to what has been previously noted aboutbreed
: thatbreed
is dependent on therace
(i.e.breed
forCat
is not the same thing asbreed
forDog
).
- They either
"Breed" as Common- / Shared- Property Approach
Please note:
The SQL below can be run in the same database as the model presented above:
- The
Race
table is the same - The
Breed
table is new - The three
Animal
tables have been appended with a2
- The
- Even with
Breed
being a now common property, it does not seem right not to haveRace
noted in the main/parent entity (even if it is technically relationally correct). So, bothRaceID
andBreedID
are represented inAnimal2
. In order to prevent a mismatch between theRaceID
noted inAnimal2
and aBreedID
that is for a differentRaceID
, I have added a FK on bothRaceID, BreedID
that references a UNIQUE CONSTRAINT of those fields in theBreed
table. I usually despise pointing a FK to a UNIQUE CONSTRAINT, but here is one of the few valid reasons to do so. A UNIQUE CONSTRAINT is logically an "Alternate Key", which makes it valid for this use. Please also note that theBreed
table still has a PK on justBreedID
.- The reason for not going with just a PK on the combined fields and no UNIQUE CONSTRAINT is that it would allow for the same
BreedID
to be repeated across different values ofRaceID
. - The reason for not switching which the PK and UNIQUE CONSTRAINT around is that this might not be the only usage of
BreedID
, so it should still be possible to reference a specific value ofBreed
without having theRaceID
available.
- The reason for not going with just a PK on the combined fields and no UNIQUE CONSTRAINT is that it would allow for the same
- While the following model does work, it has two potential flaws regarding the concept of a shared
Breed
(and are why I prefer theRace
-specificBreed
tables).- There is an implicit assumption that ALL values of
Breed
have the same properties. There is no easy way in this model to have disparate properties betweenDog
"breeds" andElephant
"breeds". However, there still is a way to do this, which is noted in the "Final Edit" section. - There is no way to share a
Breed
across more than one race. I am not sure if that is desirable to do (or maybe not in the concept of animals but possibly in other situations that would be using this type of model), but it is not possible here.
- There is an implicit assumption that ALL values of
CREATE TABLE Race
(
RaceID INT NOT NULL PRIMARY KEY,
RaceName VARCHAR(50) NOT NULL
);
CREATE TABLE Breed
(
BreedID INT NOT NULL PRIMARY KEY,
RaceID INT NOT NULL, -- FK to Race
BreedName VARCHAR(50)
);
ALTER TABLE Breed
ADD CONSTRAINT [UQ_Breed]
UNIQUE (RaceID, BreedID);
ALTER TABLE Breed
ADD CONSTRAINT [FK_Breed_Race]
FOREIGN KEY (RaceID)
REFERENCES Race (RaceID);
CREATE TABLE Animal2
(
AnimalID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
RaceID INT NOT NULL, -- FK to Race, FK to Breed
BreedID INT NOT NULL, -- FK to Breed
Name VARCHAR(50)
-- other properties common to all "Animal" types
);
ALTER TABLE Animal2
ADD CONSTRAINT [FK_Animal2_Race]
FOREIGN KEY (RaceID)
REFERENCES Race (RaceID);
-- This FK points to the UNIQUE CONSTRAINT on Breed, _not_ to the PK!
ALTER TABLE Animal2
ADD CONSTRAINT [FK_Animal2_Breed]
FOREIGN KEY (RaceID, BreedID)
REFERENCES Breed (RaceID, BreedID);
CREATE TABLE AnimalCat2
(
AnimalID INT NOT NULL PRIMARY KEY, -- FK to Animal
HairColor VARCHAR(50) NOT NULL
);
ALTER TABLE AnimalCat2
ADD CONSTRAINT [FK_AnimalCat2_Animal2]
FOREIGN KEY (AnimalID)
REFERENCES Animal2 (AnimalID);
CREATE TABLE AnimalDog2
(
AnimalID INT NOT NULL PRIMARY KEY,
HairColor VARCHAR(50) NOT NULL
);
ALTER TABLE AnimalDog2
ADD CONSTRAINT [FK_AnimalDog2_Animal2]
FOREIGN KEY (AnimalID)
REFERENCES Animal2 (AnimalID);
Final Edit (hopefully ;-)
- Regarding the possibility (and then difficulty) of handling disparate properties between types of
Breed
, it is possible to employ the same subclass / inheritance concept but withBreed
as the main entity. In this setup theBreed
table would have the properties common to all types ofBreed
(just like theAnimal
table) andRaceID
would represent the type ofBreed
(same as it does in theAnimal
table). Then you would have subclass tables such asBreedCat
,BreedDog
, and so on. For smaller projects this might be considered "over-engineering", but it is being mentioned as an option for situations that would benefit from it. For both approaches, it sometimes helps to create Views as short-cuts to the full entities. For example, consider:
CREATE VIEW Cats AS SELECT an.AnimalID, an.RaceID, an.Name, -- other "Animal" properties that are shared across "Race" types cat.CatBreedID, cat.HairColor -- other "Cat"-specific properties as needed FROM Animal an INNER JOIN AnimalCat cat ON cat.AnimalID = an.AnimalID -- maybe add in JOIN(s) and field(s) for "Race" and/or "Breed"
- While not part of the logical entities, it is fairly common to have audit fields in the tables to at least get a sense of when the records are being inserted and updated. So in practical terms:
- A
CreatedDate
field would be added to theAnimal
table. This field is not needed in any of the subclass tables (e.g.AnimalCat
) as the rows being inserted for both tables should be done at the same time within a transaction. - A
LastModifiedDate
field would be added to theAnimal
table and all subclass tables. This field gets updated only if that particular table is updated: if an update occurs inAnimalCat
but not inAnimal
for a particularAnimalID
, then only theLastModifiedDate
field inAnimalCat
would be set.
- A
First off, you are doing well to distinguish between ER modeling and relational modeling. Many newbies don't.
Here are some buzzwords you can use to look up helpful articles on the web.
Your case is a classic case of class/subclass or, if you like, type/subtype.
The phrase that's used in ER modeling is "generalization/specialization". And many of the articles show this under something called EER (Enhanced Entity-Relationship) modeling. This wasn't in Peter Chen's original presentation of ER modeling. It was added later. For a pretty good summary of gen/spec in pdf form, click here
Next, when converting a class/subclass case to relational modeling you design tables. There is more than one approach. The two main approaches are called single table inheritance and class table inheritance. Each has advantages and drawbacks. The best presentation of these two designs comes from Martin Fowler. You can see his outline here and here.
The big advantage of single table inheritance is simplicity. It's all stored in one table. The big drawback is a lot of NULLS. This can waste space and time and result in confusing logic.
Class table inheritance requires joins, but they are simple and fast. Especially if you use a technique called shared primary key, in which the PK in the subclass tables is a copy of the PK in the superclass table. You can create views for each subclass that join superclass data with subclass data.
Finally, there is a tag in this area that collects questions like yours together.
Here it is: subtypes
I see on possible design as
Table Race
RaceId- PK- Int
RaceName - Varchar(50)
Table Breed
BreedId - PK- Int
RaceId - FK - Int
BreedName - varchar(50)
Table Animal
AnimalId - PK- Int
BreedId - FK - Int
Other Columns....
These PKs above would be auto-incrementing column. Other columns in Animal
table could be named accordingly.