What do compilers do with compile-time branching?
Note that although the optimizer may well be able to prune statically-known tests and unreachable branches from the generated code, the compiler still needs to be able to compile each branch.
That is:
int foo() {
#if 0
return std::cout << "this isn't going to work\n";
#else
return 1;
#endif
}
will work fine, because the preprocessor strips out the dead branch before the compiler sees it, but:
int foo() {
if (std::is_integral<double>::value) {
return std::cout << "this isn't going to work\n";
} else {
return 1;
}
}
won't. Even though the optimizer can discard the first branch, it will still fail to compile. This is where using enable_if
and SFINAE help, because you can select the valid (compilable) code, and the invalid (un-compilable) code's Failure to compile Is Not An Error.
The compiler may be smart enough to see that it can replace the On second thought, if
statement body with two different function implementations, and just choose the right one. But as of 2014 I doubt there is any compiler that is smart enough to do that. I may be wrong though.std::is_integral
is simple enough that I think it will be optimized away.
Your idea of overloading on the result of std::is_integral
is one possible solution.
Another and IMHO cleaner solution is to use std::enable_if
(together with std::is_integral
).
TL;DR
There are several ways to get different run-time behavior dependent on a template parameter. Performance should not be your primary concern here, but flexibility and maintainability should. In all cases, the various thin wrappers and constant conditional expressions will all be optimized away on any decent compiler for release builds. Below a small summary with the various tradeoffs (inspired by this answer by @AndyProwl).
Run-time if
Your first solution is the simple run-time if
:
template<class T>
T numeric_procedure(const T& x)
{
if (std::is_integral<T>::value) {
// valid code for integral types
} else {
// valid code for non-integral types,
// must ALSO compile for integral types
}
}
It is simple and effective: any decent compiler will optimize away the dead branch.
There are several disadvantages:
- on some platforms (MSVC), a constant conditional expression yields a spurious compiler warning which you then need to ignore or silence.
- But worse, on all conforming platforms, both branches of the
if/else
statement need to actually compile for all typesT
, even if one of the branches is known not to be taken. IfT
contains different member types depending on its nature, then you will get a compiler error as soon as you try to access them.
Tag dispatching
Your second approach is known as tag-dispatching:
template<class T>
T numeric_procedure_impl(const T& x, std::false_type)
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
template<class T>
T numeric_procedure_impl(const T& x, std::true_type)
{
// valid code for integral types
}
template<class T>
T numeric_procedure(const T& x)
{
return numeric_procedure_impl(x, std::is_integral<T>());
}
It works fine, without run-time overhead: the temporary std::is_integral<T>()
and the call to the one-line helper function will both be optimized way on any decent platform.
The main (minor IMO) disadvantage is that you have some boilerplate with 3 instead of 1 function.
SFINAE
Closely related to tag-dispatching is SFINAE (Substitution failure is not an error)
template<class T, class = typename std::enable_if<!std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
template<class T, class = typename std::enable_if<std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
// valid code for integral types
}
This has the same effect as tag-dispatching but works slightly differently. Instead of using argument-deduction to select the proper helper overload, it directly manipulates the overload set for your main function.
The disadvantage is that it can be a fragile and tricky way if you don't know exactly what the entire overload set is (e.g. with template heavy code, ADL could pull in more overloads from associated namespaces you didn't think of). And compared to tag-dispatching, selection based on anything other than a binary decision is a lot more involved.
Partial specialization
Another approach is to use a class template helper with a function application operator and partially specialize it
template<class T, bool>
struct numeric_functor;
template<class T>
struct numeric_functor<T, false>
{
T operator()(T const& x) const
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
};
template<class T>
struct numeric_functor<T, true>
{
T operator()(T const& x) const
{
// valid code for integral types
}
};
template<class T>
T numeric_procedure(T const& x)
{
return numeric_functor<T, std::is_integral<T>::value>()(x);
}
This is probably the most flexible approach if you want to have fine-grained control and minimal code duplication (e.g. if you also want to specialize on size and/or alignment, but say only for floating point types). The pattern matching given by partial template specialization is ideally suited for such advanced problems. As with tag-dispatching, the helper functors are optimized away by any decent compiler.
The main disadvantage is the slightly larger boiler-plate if you only want to specialize on a single binary condition.
If constexpr (C++1z proposal)
This is a reboot of failed earlier proposals for static if
(which is used in the D programming language)
template<class T>
T numeric_procedure(const T& x)
{
if constexpr (std::is_integral<T>::value) {
// valid code for integral types
} else {
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
}
As with your run-time if
, everything is in one place, but the main advantage here is that the else
branch will be dropped entirely by the compiler when it is known not to be taken. A great advantage is that you keep all code local, and do not have to use little helper functions as in tag dispatching or partial template specialization.
Concepts-Lite (C++1z proposal)
Concepts-Lite is an upcoming Technical Specification that is scheduled to be part of the next major C++ release (C++1z, with z==7
as the best guess).
template<Non_integral T>
T numeric_procedure(const T& x)
{
// valid code for non-integral types,
// CAN contain code that is invalid for integral types
}
template<Integral T>
T numeric_procedure(const T& x)
{
// valid code for integral types
}
This approach replaces the class
or typename
keyword inside the template< >
brackets with a concept name describing the family of types that the code is supposed to work for. It can be seen as a generalization of the tag-dispatching and SFINAE techniques. Some compilers (gcc, Clang) have experimental support for this feature. The Lite adjective is referring to the failed Concepts C++11 proposal.
To answer the title question about how compilers handle if(false)
:
They optimize away constant branch conditions (and the dead code)
The language standard does not of course require compilers to not be terrible, but the C++ implementations that people actually use are non-terrible in this way. (So are most C implementations, except for maybe very simplistic non-optimizing ones like tinycc.)
One of the major reasons C++ is designed around if(something)
instead of the C preprocessor's #ifdef SOMETHING
is that they're equally efficient. Many C++ features (like constexpr
) only got added after compilers already implemented the necessary optimizations (inlining + constant propagation). (The reason we put up with all the undefined-behaviour pitfalls and gotchas of C and C++ is performance, especially with modern compilers that aggressively optimize on the assumption of no UB. The language design typically doesn't impose unnecessary performance costs.)
But if you care about debug-mode performance, the choice can be relevant depending on your compiler. (e.g. for a game or other program with real-time requirements for a debug build to even be testable).
e.g. clang++ -O0
("debug mode") still evaluates an if(constexpr_function())
at compile time and treats it like if(false)
or if(true)
. Some other compilers only eval at compile-time if they're forced to (by template-matching).
There is no performance cost for if(false)
with optimization enabled. (Barring missed-optimization bugs, which might depend on how early in the compile process the condition can be resolved to false and dead-code elimination can remove it before the compiler "thinks about" reserving stack space for its variables, or that the function may be non-leaf, or whatever.)
Any non-terrible compiler can optimize away dead code behind a compile-time-constant condition (Wikipedia: Dead Code Elimination). This is part of the baseline expectations people have for a C++ implementation to be usable in the real world; it's one of the most basic optimizations and all compilers in real use do it for simple cases like a constexpr
.
Often constant-propagation (especially after inlining) will make conditions compile-time constants even if they weren't obviously so in the source. One of the more-obvious cases is optimizing away the compare on the first iterations of a for (int i=0 ; i<n ; i++)
so it can turn into a normal asm loop with a conditional branch at the bottom (like a do{}while
loop in C++) if n
is constant or provably > 0
. (Yes, real compilers do value-range optimizations, not just constant propagation.)
Some compilers, like gcc and clang, remove dead code inside an if(false)
even in "debug" mode, at the minimum level of optimization that's required for them to transform the program logic through their internal arch-neutral representations and eventually emit asm. (But debug mode disables any kind of constant-propagation for variables that aren't declared const
or constexpr
in the source.)
Some compilers only do it when optimization is enabled; for example MSVC really likes to be literal in its translation of C++ to asm in debug mode and will actually create a zero in a register and branch on it being zero or not for if(false)
.
For gcc debug mode (-O0
), constexpr
functions aren't inlined if they don't have to be. (In some places the language requires a constant, like an array size inside a struct. GNU C++ supports C99 VLAs, but does choose to inline a constexpr function instead of actually making a VLA in debug mode.)
But non-function constexpr
s do get evaluated at compile time, not stored in memory and tested.
But just to reiterate, at any level of optimization, constexpr
functions are fully inlined and optimized away, and then the if()
Examples (from the Godbolt compiler explorer)
#include <type_traits>
void baz() {
if (std::is_integral<float>::value) f1(); // optimizes for gcc
else f2();
}
All compilers with -O2
optimization enabled (for x86-64):
baz():
jmp f2() # optimized tailcall
Debug-mode code quality, normally not relevant
GCC with optimization disabled still evaluates the expression and does dead-code elimination:
baz():
push rbp
mov rbp, rsp # -fno-omit-frame-pointer is the default at -O0
call f2() # still an unconditional call, no runtime branching
nop
pop rbp
ret
To see gcc not inline something with optimization disabled
static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
if (always_false()) f1();
else f2();
}
static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
if (always_false()) f1();
else f2();
}
;; gcc9.1 with no optimization chooses not to inline the constexpr function
baz():
push rbp
mov rbp, rsp
call always_false()
test al, al # the bool return value
je .L9
call f1()
jmp .L11
.L9:
call f2()
.L11:
nop
pop rbp
ret
MSVC's braindead literal code-gen with optimization disabled:
void foo() {
if (false) f1();
else f2();
}
;; MSVC 19.20 x86-64 no optimization
void foo(void) PROC ; foo
sub rsp, 40 ; 00000028H
xor eax, eax ; EAX=0
test eax, eax ; set flags from EAX (which were already set by xor)
je SHORT $LN2@foo ; jump if ZF is set, i.e. if EAX==0
call void f1(void) ; f1
jmp SHORT $LN3@foo
$LN2@foo:
call void f2(void) ; f2
$LN3@foo:
add rsp, 40 ; 00000028H
ret 0
Benchmarking with optimization disabled is not useful
You should always enable optimization for real code; the only time debug-mode performance matters is when that's a pre-condition for debugability. It's not a useful proxy to avoid having your benchmark optimize away; different code gains more or less from debug mode depending on how it's written.
Unless that's a really big deal for your project, and you just can't find enough info about local vars or something with minimal optimization like g++ -Og
, the headline of this answer is the full answer. Ignore debug mode, only bother thinking about quality of the asm in optimized builds. (Preferably with LTO enabled, if your project can enable that to allow cross-file inlining.)