Still Confused About Identifying vs. Non-Identifying Relationships
"as I don't want to learn something wrong".
Welll, if you really mean that, then you can stop worrying about ER lingo and terminology. It is imprecise, confused, confusing, not at all generally agreed-upon, and for the most part irrelevant.
ER is a bunch of rectangles and straight lines drawn on a piece of paper. ER is deliberately intended to be a means for informal modeling. As such, it is a valuable first step in database design, but it is also just that : a first step.
Never shall an ER diagram get anywhere near the preciseness, accuracy and completeness of a database design formally written out in D.
The technical definition of an identifying relationship is that a child's foreign key is part of its primary key.
CREATE TABLE AuthoredBook (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors(author_id),
FOREIGN KEY (book_id) REFERENCES Books(book_id)
);
See? book_id
is a foreign key, but it's also one of the columns in the primary key. So this table has an identifying relationship with the referenced table Books
. Likewise it has an identifying relationship with Authors
.
A comment on a YouTube video has an identifying relationship with the respective video. The video_id
should be part of the primary key of the Comments
table.
CREATE TABLE Comments (
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
PRIMARY KEY (video_id, user_id, comment_dt),
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
It may be hard to understand this because it's such common practice these days to use only a serial surrogate key instead of a compound primary key:
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
This can obscure cases where the tables have an identifying relationship.
I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person's primary key.
Re comment from @Niels:
So if we use a surrogate key instead of a compound primary key, there is no notable difference between use identifying or non-identifying relationship ?
I suppose so. I hesitate to say yes, because we haven't changed the logical relationship between the tables by using a surrogate key. That is, you still can't make a Comment without referencing an existing Video. But that just means video_id must be NOT NULL. And the logical aspect is, to me, really the point about identifying relationships.
But there's a physical aspect of identifying relationships as well. And that's the fact that the foreign key column is part of the primary key (the primary key is not necessarily a composite key, it could be a single column which is both the primary key of Comments as well as the foreign key to the Videos table, but that would mean you can store only one comment per video).
Identifying relationships seem to be important only for the sake of entity-relationship diagramming, and this comes up in GUI data modeling tools.