Why dereferencing a null pointer is undefined behaviour?

This answer from @Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.


The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.

Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).

The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.


Note:
Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.


The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.

C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.

Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.


The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.

On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).

On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.

The camps wouldn't agree on what should happen, so they made it undefined behavior.

Edit: I suppose I should add that by the time the wrote the C++ standard, its being undefined behavior was already well established in C, and (apparently) nobody thought there was a good reason to create a conflict on this point so they kept the same.


Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.

It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.