why does long long 2147483647 + 1 = -2147483648?
This signed integer overflow is undefined behaviour, like always in C/C++
What Every C Programmer Should Know About Undefined Behavior
Unless you compile with gcc -fwrapv
or equivalent to make signed integer overflow well-defined as 2's complement wrap-around. With gcc -fwrapv
or any other implementation that defines integer overflow = wraparound, the wrapping that you happened to see in practice is well-defined and follows from other ISO C rules for types of integer literals and evaluating expressions.
T var = expression
only implicitly converts the expression to type T
after evaluating the expression according to standard rules. Like (T)(expression)
, not like (int64_t)2147483647 + (int64_t)1
.
A compiler could have chosen to assume that this path of execution is never reached and emitted an illegal instruction or something. Implementing 2's complement wraparound on overflow in constant expressions is just a choice that some/most compilers make.
The ISO C standard specifies that a numeric literal has type int
unless the value is too large to fit (it can be long or long long, or unsigned for hex), or if a size override is used. Then the usual integer promotion rules apply for binary operators like +
and *
, regardless of whether it's part of a compile-time constant expression or not.
This is a simple and consistent rule that's easy for compilers to implement, even in the early days of C when compilers had to run on limited machines.
Thus in ISO C/C++ 2147483647 + 1
is undefined behaviour on implementations with 32-bit int
. Treating it as int
(and thus wrapping the value to signed negative) follows naturally from the ISO C rules for what type the expression should have, and from normal evaluation rules for the non-overflow case. Current compilers don't choose to define the behaviour differently from that.
ISO C/C++ do leave it undefined, so an implementation could pick literally anything (including nasal demons) without violating the C/C++ standards. In practice this behaviour (wrap + warn) is one of the less objectionable ones, and follows from treating signed integer overflow as wrapping, which is what often happens in practice at runtime.
Also, some compilers have options to actually define that behaviour officially for all cases, not just compile-time constant expressions. (gcc -fwrapv
).
Compilers do warn about this
Good compilers will warn about many forms of UB when they're visible at compile time, including this. GCC and clang warn even without -Wall
. From the Godbolt compiler explorer:
clang
<source>:5:20: warning: overflow in expression; result is -2147483648 with type 'int' [-Winteger-overflow]
a = 2147483647 + 1;
^
gcc
<source>: In function 'void foo()':
<source>:5:20: warning: integer overflow in expression of type 'int' results in '-2147483648' [-Woverflow]
5 | a = 2147483647 + 1;
| ~~~~~~~~~~~^~~
GCC has had this warning enabled by default since at least GCC4.1 in 2006 (oldest version on Godbolt), and clang since 3.3.
MSVC only warns with -Wall
, which for MSVC is unusably verbose most of the time, e.g. stdio.h
results in tons of warnings like 'vfwprintf': unreferenced inline function has been removed
. MSVC's warning for this looks like:
MSVC -Wall
<source>(5): warning C4307: '+': signed integral constant overflow
@HumanJHawkins asked why it was designed this way:
To me, this question is asking, why doesn't the compiler also use the smallest data type that the result of a math operation will fit into? With integer literals, it would be possible to know at compile time that an overflow error was occurring. But the compiler does not bother to know this and handle it. Why is that?
"Doesn't bother to handle it" is a bit strong; compilers do detect the overflow and warn about it. But they follow ISO C rules that say int + int
has type int
, and that the numeric literals each have type int
. Compilers merely choose on purpose to wrap instead of to widening and giving the expression a different type than you'd expect. (Instead of bailing out entirely because of the UB.)
Wrapping is common when signed overflow happens at run-time, although in loops compilers do aggressively optimize int i
/ array[i]
to avoid redoing sign-extension every iteration.
Widening would bring its own (smaller) set of pitfalls like printf("%d %d\n", 2147483647 + 1, 2147483647);
having undefined behaviour (and failing in practice on 32-bit machines) because of a type mismatch with the format string. If 2147483647 + 1
implicitly promoted to long long
, you'd need a %lld
format string. (And it would break in practice because a 64-bit int is typically passed in two arg-passing slots on a 32-bit machine, so the 2nd %d
would probably see the 2nd half of the first long long
.)
To be fair, that's already a problem for -2147483648
. As an expression in C/C++ source it has type long
or long long
. It's parsed as 2147483648
separately from the unary -
operator, and 2147483648
doesn't fit in a 32-bit signed int
. Thus it has the next largest type that can represent the value.
However, any program affected by that widening would have had UB (and probably wrapping) without it, and it's more likely that widening will make code happen to work. There's a design philosophy issue here: too many layers of "happens to work" and forgiving behaviour make it hard to understand exactly why something does work, and hard to verity that it will be portable to other implementations with other type widths. Unlike "safe" languages like Java, C is very unsafe and has different implementation-defined things on different platforms, but many developers only have one implementation to test on. (Especially before the internet and online continuous-integration testing.)
ISO C doesn't define the behaviour, so yes a compiler could define new behaviour as an extension without breaking compatibility with any UB-free programs. But unless every compiler supported it, you couldn't use it in portable C programs. I could imagine it as a GNU extension supported by gcc/clang/ICC at least.
Also, such an options would somewhat conflict with -fwrapv
which does define the behaviour. Overall I think it's unlikely to catch be adopted because there's convenient syntax to specifying the type of a literal (0x7fffffffUL + 1
gives you an unsigned long
which is guaranteed to be wide enough for that value as a 32-bit unsigned integer.)
But let's consider this as a choice for C in the first place, instead of the current design.
One possible design would be to infer the type of a whole integer constant expression from its value, calculated with arbitrary precision. Why arbitrary precision instead of long long
or unsigned long long
? Those might not be large enough for intermediate parts of the expression if the final value is small because of /
, >>
, -
, or &
operators.
Or a simpler design like the C preprocessor where constant integer expressions are evaluated at some fixed implementation-defined width like at least 64-bit. (But then assign a type based on the final value, or based on the widest temporary value in an expression?) But that has the obvious downside for early C on 16-bit machines that it makes compile-time expressions slower to evaluation than if the compiler can use the machine's native integer width internally for int
expressions.
Integer constant-expressions are already somewhat special in C, required to be evaluated at compile time in some contexts, e.g. for static int array[1024 * 1024 * 1024];
(where the multiplies will overflow on implementations with 16-bit int.)
Obviously we can't efficiently extend the promotion rule to non-constant expressions; if (a*b)/c
might have to evaluate a*b
as long long
instead of int
on a 32-bit machine, the division will require extended precision. (For example x86's 64-bit / 32-bit => 32-bit division instruction faults on overflow of the quotient instead of silently truncating the result, so even assigning the result to an int
wouldn't let the compiler optimize well for some cases.)
Also, do we really want the behaviour / definedness of a * b
to depend on whether a
and b
are static const
or not? Having compile time evaluation rules match the rules for non-constant expressions seems good in general, even though it leaves these nasty pitfalls. But again, this is something good compilers can warn about in constant expressions.
Other more common cases of this C gotcha are things like 1<<40
instead of 1ULL << 40
to define a bit flag, or writing 1T as 1024*1024*1024*1024
.
2147483647 + 1
is evaluated as the sum of two ints
and therefore overflows.
2147483648
is too big to fit in an int
and is therefore assumed by the compiler to be a long
(or a long long
in MSVC). It therefore does not overflow.
To perform the summation as a long long
use the appropriate constant suffix, i.e.
a = 2147483647LL + 1;