Best data modelling approach to handle redundant foreign keys in a database about Surveys, Questions and Responses
As per my understanding of your specifications, your business environment involves a conceptual-level ternary relationship. In this regard, you need to define:
- the relationship (or association) type between the entity types Person and Survey;
- the relationship type between Survey and Question;
- the relationship type that establishes the connection between the two aforesaid relationship types and, as a consequence, between Person, Survey and Question, i.e., Response (a shorter name that simplifies interpretation, from my point of view).
So, I consider that you are on the right track with your Approach 1, although it requires some small (yet important) refinements in order to make it more accurate. I will detail such refinements and other relevant considerations in the following sections.
Business rules
Let us expand the applicable business rules a bit and reformulate them in the following way:
- A Person registers in zero-one-or-many Surveys
- A Survey gets the registration of zero-one-or-many Persons
- A Survey is integrated by one-to-many Questions
- A Question integrates zero-one-or-many Surveys
- A Question receives zero-one-or-many Responses
- A Response is provided by exactly-one Person in the context of exactly-one Survey
Expository IDEF1X diagram
Then, I have created the IDEF1Xa diagram that is presented in Figure 1, which synthesizes the business rules formulated above:
a Integration Definition for Information Modeling (IDEF1X) is a highly recommendable modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on theoretical work authored by the sole founder of the relational model, i.e., Dr. E. F. Codd and also on the entity-relationship view developed by Dr. P. P. Chen.
The PersonSurvey relationship
As I see it, the PersonSurvey relationship is required to provide a means of authorization so that a Person can take part in a given Survey. In this way, once a certain Person has been registered in a specific Survey, he or she is authorized to provide Responses to the Questions that integrate the respective Survey.
The SurveyQuestion relationship
I assume that the property (or attribute) called suvery_question.question_number in your diagram is used to represent the Order of presentation of a given Question instance with respect to a particular Survey. As you can see, I have denoted such property as SurveyQuestion.PresentationOrder, and I think that you should prevent that (i) two or more Question.QuestionNumber values share (ii) the same PresentationOrder value in (iii) the same SurveyQuestion occurrence.
To portray that need, I have included a composite ALTERNATE KEY (AK) in the box representing this entity type, which is comprised of the combination of properties (SurveyNumber, QuestionNumber, PresentationOrder). As you are well aware, a composite AK can be declared in a logical DDL design with the aid of a multi-column UNIQUE constraint (as I exemplified in the SurveyQuestion
table that is part of the expository DDL layout expounded a few sections below).
The Response entity type
Yes, with the Response entity type I am depicting a relationship between two other relationships; it may seem awkward at first glance but there is nothing wrong with this approach, as long as it (a) represents the features of the business context of interest accurately and (b) is represented properly in a logical-level layout.
Yes, you are totally correct, it would be an error to portray that part of the scenario at the logical level of abstraction by means of two Response.SurveyNumber
(or, say, Response.SurveyId
) values referenced from two different columns in the same Response
row.
Derived logical SQL-DDL layout
-- You should determine which are the most fitting
-- data types and sizes for all your table columns
-- depending on your business context characteristics.
-- As one would expect, you are free to make use of
-- your preferred (or required) naming conventions.
CREATE TABLE Person (
PersonId INT NOT NULL,
FirstName CHAR(30) NOT NULL,
LastName CHAR(30) NOT NULL,
GenderCode CHAR(3) NOT NULL,
BirthDate DATE NOT NULL,
CreatedDateTime DATETIME NOT NULL,
--
CONSTRAINT Person_PK PRIMARY KEY (PersonId),
CONSTRAINT Person_AK UNIQUE (
FirstName,
LastName,
GenderCode,
BirthDate
)
);
CREATE TABLE Survey (
SurveyNumber INT NOT NULL,
Description CHAR(255) NOT NULL,
CreatedDateTime DATETIME NOT NULL,
--
CONSTRAINT Survey_PK PRIMARY KEY (SurveyNumber),
CONSTRAINT Survey_AK UNIQUE (Description)
);
CREATE TABLE PersonSurvey (
PersonId INT NOT NULL,
SurveyNumber INT NOT NULL,
RegisteredDateTime DATETIME NOT NULL,
--
CONSTRAINT PersonSurvey_PK PRIMARY KEY (PersonId, SurveyNumber),
CONSTRAINT PersonSurveyToPerson_FK FOREIGN KEY (PersonId)
REFERENCES Person (PersonId),
CONSTRAINT PersonSurveyToSurvey_FK FOREIGN KEY (SurveyNumber)
REFERENCES Survey (SurveyNumber)
);
CREATE TABLE Question (
QuestionNumber INT NOT NULL,
Wording CHAR(255) NOT NULL,
CreatedDateTime DATETIME NOT NULL,
--
CONSTRAINT Question_PK PRIMARY KEY (QuestionNumber),
CONSTRAINT Question_AK UNIQUE (Wording)
);
CREATE TABLE SurveyQuestion (
SurveyNumber INT NOT NULL,
QuestionNumber INT NOT NULL,
PresentationOrder TINYINT NOT NULL,
IsMandatory BIT NOT NULL,
IntegratedDateTime DATETIME NOT NULL,
--
CONSTRAINT SurveyQuestion_PK PRIMARY KEY (SurveyNumber, QuestionNumber),
CONSTRAINT SurveyQuestion_AK UNIQUE (
QuestionNumber,
SurveyNumber,
PresentationOrder
),
CONSTRAINT SurveyQuestionToSurvey_FK FOREIGN KEY (SurveyNumber)
REFERENCES Survey (SurveyNumber),
CONSTRAINT SurveyQuestionToQuestion_FK FOREIGN KEY (QuestionNumber)
REFERENCES Question (QuestionNumber)
);
CREATE TABLE Response (
SurveyNumber INT NOT NULL,
QuestionNumber INT NOT NULL,
PersonId INT NOT NULL,
Content TEXT NOT NULL,
ProvidedDateTime DATETIME NOT NULL,
--
CONSTRAINT Response_PK PRIMARY KEY (SurveyNumber, QuestionNumber, PersonId),
CONSTRAINT ResponseToPersonSurvey_FK FOREIGN KEY (PersonId, SurveyNumber)
REFERENCES PersonSurvey (PersonId, SurveyNumber),
CONSTRAINT ResponseToSurveyQuestion_FK FOREIGN KEY (SurveyNumber, QuestionNumber)
REFERENCES SurveyQuestion (SurveyNumber, QuestionNumber)
);
Two composite FOREIGN KEYs in the Response
table
This is, probably, the most important point to discuss: the references made from a given Response
row to
SurveyQuestion.SurveyNumber
, andSurveyPerson.SurveyNumber
must have matching values. As far as I am concerned, the best option to enforce this condition in a declarative way is by making use of two composite FOREIGN KEYs (FKs).
As shown in the DDL design, the first FK is making a reference to the PersonSurvey
table PRIMARY KEY (PK), i.e., (PersonId, SurveyNumber)
, and is conformed by the columns Response.PersonId
and Response.SurveyNumber
.
The second FK is pointing to the SurveyQuestion
table PK, i.e., (SurveyNumber, QuestionNumber)
, and is, accordingly, made up of the columns Response.SurveyNumber
and Response.QuestionNumber
.
In this way, the Response.SurveyNumber
column is quite instrumental since it is used as part of a FK reference in two different constraints.
With this method, one ensures database management system-guaranteed referential integrity from
- (a)
Response
to thePersonSurvey
; - (b)
Response
to theSurveyQuestion
; and - (c) each of the tables representing an associative entity type to the tables standing for independent entity types, namely
Person
,Survey
andQuestion
.
Derived data to avoid update anomalies
I have noticed in your diagram two elements that I esteem are worth mentioning. These elements are related to two PersonSurvey
columns that can (should) be derived.
In that regard, you can derive the PersonSurvey.IsStarted
datum by querying if a given Person
occurrence has provided one or more Responses
to Questions
that integrate an exact Survey
via the SurveyQuestion
table.
And you can also obtain the PersonSurvey.IsCompleted
data point by determining if a given Person
instance has supplied a Response
to all the Questions
that cointain a value of 'TRUE' in the IsMandatory
column in a specific SurveyQuestion
row.
By way of the derivation of these values, you are preventing some update anomalies that would have eventually arisen in case you had kept such values in the SurveyQuestion
column.
Important consideration
As @Dave rightly points out in his comment, if you face a future requirement demanding the management of different kinds of responses that imply managing dates, numeric values, multiple choice, and another possible aspects, you will have to extend this database layout.