Why shouldn't we allow NULLs?

I think the question is poorly phrased, as the wording implies that you've already decided NULLs are bad. Perhaps you meant "Should we allow NULLs?"

Anyway, here is my take on it: I think NULLs are a good thing. When you start preventing NULLs just because "NULLs are bad" or "NULLs are hard", you start making up data. For example, what if you don't know my birth date? What are you going to put in the column until you know? If you're anything like a lot of anti-NULL folks, you're going to enter 1900-01-01. Now I'm going to be placed in the geriatric ward and probably get a call from my local news station congratulating me on my long life, asking me my secrets to living such a long life, etc.

If a row can be entered where it is possible that you don't know the value of a column, I think NULL makes a lot more sense than picking some arbitrary token value to represent the fact that it is unknown - a value which others will have to already know, reverse engineer, or ask around to figure out what it means.

There is a balance, though - not every column in your data model should be nullable. There are often optional fields on a form, or pieces of information that otherwise don't get collected at the time the row is created. But that doesn't mean you can defer populating all of the data. :-)

Also the ability to use NULL can be limited by crucial requirements in real life. In the medical field, for example, it can be a life-or-death matter to know why a value is unknown. Is the heart rate NULL because there wasn't a pulse, or because we haven't measured it yet? In such a case, can we put NULL in the heart rate column, and have notes or a different column with a NULL-because reason?

Don't be afraid of NULLs, but be willing to learn or dictate when and where they should be used, and when and where they shouldn't.

Established reasons are:

NULL is not a value, and therefore has no intrinsic data type. Nulls need special handling all over the place when code that otherwise relies on actual types might also receive the un-typed NULL.
NULL breaks two-value (familiar True or False) logic, and requires a three-value logic. This is far more complex to even implement correctly, and is certainly poorly understood by most DBAs and just about all non-DBAs. As a consequence, it positively invites many subtle bugs in the application.
The semantic meaning of any specific NULL is left to the application, unlike actual values.

Semantics like “not applicable” and “unknown” and “sentinel” are common, and there are others too. They are frequently used simultaneously within the same database, even within the same relation; and are of course inexplicit and indistinguishable and incompatible meanings.
They aren't necessary to relational databases, as argued in “How To Handle Missing Information Without Nulls”. Further normalisation is an obvious first step to try ridding a table of NULLs.

This doesn't mean NULL should never be allowed. It does argue that there are many good reasons to disallow NULL wherever feasible.

Significantly, it argues for trying very hard – through better schema design, and better database engines, and even better database languages – to make it feasible to avoid NULL more often.

Fabian Pascal responds to a number of arguments, in “Nulls Nullified”.

I disagree, nulls are an essential element of database design. The alternative, as you alluded too, would be a proliferation of known values to represent the missing or unknown. The problem lies with null being so widely misunderstood and as a result being used inappropriately.

IIRC, Codd suggested the current implementation of null (meaning not present/missing) could be improved by having two null markers rather than one, "not present but applicable" and "not present and not applicable". Can't envisage how relational designs would be improved by this personally.

Why shouldn't we allow NULLs?

Tags:

Database Design

Null

Related

Recent Posts