Function not called in code gets called at runtime
The program contains undefined behavior, as dereferencing a null pointer
(i.e. calling foo()
in main without assigning a valid address to it
beforehand) is UB, therefore no requirements are imposed by the standard.
Executing format_disk
at runtime is a perfect valid situation when
undefined behavior has been hit, it's as valid as just crashing (like
when compiled with GCC). Okay, but why is Clang doing that? If you
compile it with optimizations off, the program will no longer output
"formatting hard disk drive", and will just crash:
$ clang++ -std=c++17 -O0 a.cpp && ./a.out
Segmentation fault (core dumped)
The generated code for this version is as follows:
main: # @main
push rbp
mov rbp, rsp
call qword ptr [foo]
xor eax, eax
pop rbp
ret
It tries to make a call to a function to which foo
points, and as foo
is initialized with nullptr
(or if it didn't have any initialization,
this would still be the case), its value is zero. Here, undefined
behavior has been hit, so anything can happen at all and the program
is rendered useless. Normally, making a call to such an invalid address
results in segmentation fault errors, hence the message we get when
executing the program.
Now let's examine the same program but compiling it with optimizations on:
$ clang++ -std=c++17 -O3 a.cpp && ./a.out
formatting hard disk drive!
The generated code for this version is as follows:
never_called(): # @never_called()
ret
main: # @main
push rax
mov edi, .L.str
call puts
xor eax, eax
pop rcx
ret
.L.str:
.asciz "formatting hard disk drive!"
Interestingly, somehow optimizations modified the program so that
main
calls std::puts
directly. But why did Clang do that? And why is
never_called
compiled to a single ret
instruction?
Let's get back to the standard (N4660, specifically) for a moment. What does it say about undefined behavior?
3.27 undefined behavior [defns.undefined]
behavior for which this document imposes no requirements
[Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression never exhibits behavior explicitly specified as undefined ([expr.const]). — end note]
Emphasis mine.
A program that exhibits undefined behavior becomes useless, as everything
it has done so far and will do further has no meaning if it contains
erroneous data or constructs. With that in mind, do remember that
compilers may completely ignore for the case when undefined behavior
is hit, and this actually is used as discovered facts when optimizing a
program. For instance, a construct like x + 1 > x
(where x
is a signed integer) will be optimized away to a constant,
true
, even if the value of x
is unknown at compile-time. The reasoning
is that the compiler wants to optimize for valid cases, and the only
way for that construct to be valid is when it doesn't trigger arithmetic
overflow (i.e. if x != std::numeric_limits<decltype(x)>::max()
). This
is a new learned fact in the optimizer. Based on that, the construct is
proven to always evaluate to true.
Note: this same optimization can't occur for unsigned integers, because overflowing one is not UB. That is, the compiler needs to keep the expression as it is, as it might have a different evaluation when overflow occurs (unsigned is module 2N, where N is number of bits). Optimizing it away for unsigned integers would be incompliant with the standard (thanks aschepler).
This is useful as it allows for tons of optimizations to kick
in. So
far, so good, but what happens if x
holds its maximum value at runtime?
Well, that is undefined behavior, so it's nonsense to try to reason about
it, as anything may happen and the standard imposes no requirements.
Now we have enough information in order to better examine your faulty program. We already know that accessing a null pointer is undefined behavior, and that's what's causing the funny behavior at runtime. So let's try and understand why Clang (or technically LLVM) optimized the program the way it did.
static void (*foo)() = nullptr;
static void format_disk()
{
std::puts("formatting hard disk drive!");
}
void never_called()
{
foo = format_disk;
}
int main()
{
foo();
}
Remember that it's possible to call never_called
before the main
entry
starts executing. For example, when declaring a top-level variable,
you can call it while initializing the value of that variable:
void never_called();
int x = (never_called(), 42);
If you write this snippet in your program, the program no longer exhibits undefined behavior, and the message "formatting hard disk drive!" is displayed, with optimizations either on or off.
So what's the only way this program is valid? There's this never_caled
function that assigns the address of format_disk
to foo
, so we might
find something here. Note that foo
is marked as static
, which means it
has internal linkage and can't be accessed from outside this translation
unit. In contrast, the function never_called
has external linkage, and may
be accessed from outside. If another translation unit contains a snippet
like the one above, then this program becomes valid.
Cool, but there's no one calling never_called
from outside. Even though this
is the fact, the optimizer sees that the only way for this program to
be valid is if never_called
is called before main
executes, otherwise it's
just undefined behavior. That's a new learned fact, so the compiler assumes never_called
is in fact called. Based on that new knowledge, other optimizations that
kick in may take advantage of it.
For instance, when constant
folding is
applied, it sees that the construct foo()
is only valid if foo
can be properly initialized. The only way for that to happen is if never_called
is called outside of this translation unit, so foo = format_disk
.
Dead code elimination and interprocedural optimization might find out that if foo == format_disk
, then the code inside never_called
is unneeded,
so the function's body is transformed into a single ret
instruction.
Inline expansion optimization
sees that foo == format_disk
, so the call to foo
can be replaced
with its body. In the end, we end up with something like this:
never_called():
ret
main:
mov edi, .L.str
call puts
xor eax, eax
ret
.L.str:
.asciz "formatting hard disk drive!"
Which is somewhat equivalent to the output of Clang with optimizations on. Of course, what Clang really did may (and might) be different, but optimizations are nonetheless capable of reaching the same conclusion.
Examining GCC's output with optimizations on, it seems it didn't bother investigating:
.LC0:
.string "formatting hard disk drive!"
format_disk():
mov edi, OFFSET FLAT:.LC0
jmp puts
never_called():
mov QWORD PTR foo[rip], OFFSET FLAT:format_disk()
ret
main:
sub rsp, 8
call [QWORD PTR foo[rip]]
xor eax, eax
add rsp, 8
ret
Executing that program results in a crash (segmentation fault), but if you call never_called
in another translation unit before main gets executed, then this program doesn't exhibit undefined behavior anymore.
All of this can change crazily as more and more optimizations are engineered, so do not rely on the assumption that your compiler will take care of code containing undefined behavior, it might just screw you up as well (and format your hard drive for real!)
I recommend you read What every C programmer should know about Undefined Behavior and A Guide to Undefined Behavior in C and C++, both article series are very informative and might help you out with understanding the state of art.