How can I avoid NULLs in my database, while also representing missing data?
Good on you, for eliminating Nulls. I have never allowed Nulls in any of my databases.
Of course, if nulls are prohibited, then missing information will have to be handled by some other means. Unfortunately, those other means are much too complex to be discussed in detail here.
Actually it is not so hard at all. There are three alternatives.
Here's a paper on How To Handle Missing Information Without Using NULL by H Darwen, that may help to get your head around the problem.
1.1. Sixth Normal Form is the answer. But you do not have to normalise your entire database to 6NF. For each column that is optional, you need a child table off the main table, with just the PK, which is also the FK, because it is a 1::0-1 relation. Other than the PK, the only column is the optional column.
Look at this Data Model;
AssetSerial
on page 4 is a classic case: not allAssets
haveSerialNumbers
; but when they do, I want them to store them; more important I want to ensure that they are Unique.(For the OO people out there, incidentally, that is a three level class diagram in Relational notation, a "Concrete Table Inheritance", no big deal, we've had it fro 30 years.)
1.2. For each such table, use a View to provide the 5NF form of the table. Sure, use Null (or any value that is appropriate for the column) to identify the absence of the column for any row. But do not update via the view.
1.3 Do not use straight joins to grab the 6NF column. Do not use outer joins, either (and have the server fill in a Null for the missing rows). Use a subquery to populate the column, and specify the value that you want returned for a missing value (except if you have Oracle, because its Subquery processing is even worse than its set processing). Eg. and just an eg. you can convert a numeric column to string, and use "Missing" for the missing rows.
When you do not want to go that far (6NF), you have two more options.
You can use Null substitutes. I use CHAR(0) for character colomns and 0 for numeric. But I do not allow that for FKs. Obviously you need a value that is outside the normal range of data. This does not allow Three Valued Logic.
In addition to (2), for each Nullable column, you need a boolean Indicator. For the example of the
Sex
column, the Indicator would be something likeSexIsMissing
orSexLess
(sorry). This allows very tight Three Valued Logic. Many people in that 5% like it because the db remains at 5NF (and less tables); the columns with missing info are loaded with values that are never used; they are only used if the Indicator is false. If you have an enterprise db, you can wrap that in a Function, and always use the UDF, not the raw column.
Of course, in all cases, you can never get away from writing code that is required to handle the missing info. Whether it is ISNULL()
, or a subquery for the 6NF column, or an Indicator to check before using the value, or an UDF.
If Null has a specific meaning ... then it is not a Null! By definition, Null is the Unknown Value.
Quite simply by storing only the known information - in other words the Closed World Assumption. Aim to be in at least Boyce Codd / Fifth Normal Form and you won't go far wrong.
So how do you design without NULLS? That was the original question.
It's actually quite easy. You design such that whenever you have to leave some data missing, you can do so by leaving a whole row missing. If a row isn't there, it isn't a row full of NULLs. It just plain isn't there.
So, in the case of "DateOfDeath", we have a table with two columns, namely, PersonId and DateOfDeath. PersonId references Id in the Persons table. If there is no DateOfDeath to be stored, we don't store the row. End of discussion.
If you do an OUTER JOIN between this and the Persons table, you'll get a NULL for the DateOfDeath wherever there was no row. And if you use this in a where clause, you'll get the usual perplexing behavior concerning 3-value logic. If you do an INNER JOIN, the rows for which there is no DateOfDeath will simply disappear from the join.
A design that permits every column to be NOT NULL enforced has been called sixth normal form.
Having said all that, I often allow NULLs in non critical columns. And I don't have a succinct way of telling you how I determine that a column is critical.