Wrap around explanation for signed and unsigned variables in C?
Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).
This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.
Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -0
101 = -1
110 = -2
111 = -3
Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.
One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -3
101 = -2
110 = -1
111 = -0
I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.
Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.
Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.
The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.
Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.
Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.
For example, GCC compiler can assume that for variable int i
the following condition
if (i > 0 && i + 1 > 0)
is equivalent to a mere
if (i > 0)
This is exactly what strict overflow semantics means.
Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N
where N
is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.
However, C language never performs arithmetic computations in domains smaller than that of int
/unsigned int
. Type unsigned short int
that you mention in your question will typically be promoted to type int
in expressions before any computations begin (assuming that the range of unsigned short
fits into the range of int
). Which means that 1) the computations with unsigned short int
will be preformed in the domain of int
, with overflow happening when int
overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.
For example, this code produces a wrap around
unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */
while this code
unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */
leads to undefined behavior.
If no int
overflow happens and the result is converted back to an unsigned short int
type, it is again reduced by modulo 2^N
, which will appear as if the value has wrapped around.