Unique constraint across two columns
This isn't possible with the proposed table structure declaratively. You would need triggers to enforce this.
A unique index on both columns, together with a pair of check constraints with scalar UDFs, gets quite close however.
CREATE TABLE UserProfile
(
Id INT PRIMARY KEY,
PrimaryEmail VARCHAR(100),
SecondaryEmail VARCHAR(100)
)
CREATE UNIQUE INDEX IX1
ON UserProfile(PrimaryEmail)
CREATE UNIQUE INDEX IX2
ON UserProfile(SecondaryEmail)
go
CREATE FUNCTION dbo.EmailInUseAsPrimary (@Email VARCHAR(100))
RETURNS BIT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM UserProfile WITH (READCOMMITTEDLOCK)
WHERE PrimaryEmail = @Email)
END;
go
CREATE FUNCTION dbo.EmailInUseAsSecondary (@Email VARCHAR(100))
RETURNS BIT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM UserProfile WITH (READCOMMITTEDLOCK)
WHERE SecondaryEmail = @Email)
END;
GO
ALTER TABLE UserProfile
ADD CHECK ( dbo.EmailInUseAsPrimary(SecondaryEmail) = 0),
CHECK ( dbo.EmailInUseAsSecondary(PrimaryEmail) = 0)
The reason for READCOMMITTEDLOCK
is to avoid problems with snapshot isolation.
One problem with the approach above is that because the constraints are evaluated RBAR it can fail some transactions that ought to succeed.
For the example data
INSERT INTO UserProfile
VALUES (1, '[email protected]', '[email protected]'),
(2, '[email protected]', '[email protected]')
This statement fails
UPDATE UserProfile
SET PrimaryEmail = CASE Id WHEN 1 THEN '[email protected]' WHEN 2 THEN '[email protected]' END,
SecondaryEmail = CASE Id WHEN 1 THEN '[email protected]' WHEN 2 THEN '[email protected]' END
even though at the end of the transaction the constraints would have been met. But maybe it is sufficiently unlikely that you will be performing this kind of update (swapping email addresses between both type and person) that this can be ignored.
Depending on what you are modelling I may see a problem for this restriction anyway: for some of our clients the secondary contact address for some people is a shared inbox for the whole team - so we contact the shared box if there is no response (or an out-of-office reply) to messages sent to the personal one.
The only way to enforce the desired restriction declaratively, without using programmed logic such as the trigger method described by Martin, is to treat email addresses as separate entities instead of just properties of a person as you described you had already considered:
Person EmailAddress
-------- ----------------
pID (PK) <-- pID (FK) (PK)
Name Type (PK)
Blah Address (unique)
The primary key over person ID and address type enforces (as a primary key implies a unique index) each person having at most one address of each type (primary, secondary) and the unique index on EmailAddress.Address enforces no two people having the same address no matter what address type.
query complexity
This can be addressed using views. You are right that this might introduce "query speed" differences but I'm pretty sure they would not be significant.
query speed
If this is a significant problem then you could still have primary and secondary address columns in the Person table, and maintain them in using triggers on the EmailAddress table.
You then need to chose what happens if a dev/user tries to update the addresses in the Person table directly: you can either use a trigger to update EmailAddress accordingly, or a trigger to raise an error if the user tries to set values for those columns directly - though in both cases you are trading off insert and update performance for the select speed/convenience. Of course you are now using programmed logic to control things so the complexity is similar to Martin's suggestion.
Another option is indexed views, though I think they are only available on Enterprise Edition which may rule out their use depending on your project's size.
This may all be moot, because as I said above: I doubt this structure would make a significant performance difference. I expect it would require a massive data-set for the difference to be reliably measurable, never mind noticeable by end users.
and the probability of a user using multiple email accounts actively decreases in order of magnitude after one
As long as you don't care about that one user who has many and will play merry hell if he can't record them all! Though having said that most systems only accept the possibility of one relevant address anyway.
This type of validation/constraint is extremely expensive given the fact that you have to compare against the entire table over and over again, also you cannot accomplish this through unique constraints, Martin Smith's answer above gets you close (I commented about something closer to that but very slow of course).
What you can do is to create an EAV (Entity Attribute Value) Architecture:
dbo.UserProfile(id)
dbo.UserProfile_Email(identity_col, id as FK to UserProfile, emailType, email )
where emailType is 1 or 2 (integers that specify if primary or secondary) and the email,
then you can safely put a unique constraint on the email, and each new user gets two inserts on User_Profile_Email (all wrapped in a single transaction in case one fails).