Defending "U" suffix after Hex literals

With all integer-constants

Appending u/U insures the integer-constant will be some unsigned type.

Without a u/U

For a decimal-constant, the integer-constant will be some signed type.
For a hexadecimal/octal-constant, the integer-constant will be signed or unsigned type, depending of value and integer type ranges.

Note: All integer-constants have positive values.

//      +-------- unary operator
//      |+-+----- integer-constant
int x = -123;

absence or presence of the suffix changes the outcome of an operation?

When is this important?

With various expressions, the sign-ness and width of the math needs to be controlled and preferable not surprising.

// Examples: assume 32-bit `unsigned`, `long`, 64-bit `long long`

// Bad       signed int overflow (UB)
unsigned a = 4000 * 1000 * 1000;  
// OK
unsigned b = 4000u * 1000 * 1000;  

// undefined behavior
unsigned c = 1 << 31
// OK
unsigned d = 1u << 31

printf("Size %zu\n", sizeof(0xFFFFFFFF));  // 8  type is `long long`
printf("Size %zu\n", sizeof(0xFFFFFFFFu)); // 4  type is `unsigned`

//              2 ** 63
long long e = -9223372036854775808;     // C99: bad "9223372036854775808" not representable
long long f = -9223372036854775807 - 1; // ok 
long long g = -9223372036854775808u;    // implementation defined behavior **

some_unsigned_type h_max = -1;  OK, max value for the target type.
some_unsigned_type i_max = -1u; OK, but not max value for wide unsigned types

// when negating a negative `int`
unsigned j = 0  - INT_MIN;  // typically int overflow or UB
unsigned k = 0u - INT_MIN;  // Never UB

** or an implementation-defined signal is raised.

Appending a U suffix to all hexadecimal constants makes them unsigned as you already mentioned. This may have undesirable side-effects when these constants are used in operations along with signed values, especially comparisons.

Here is a pathological example:

#define MY_INT_MAX  0x7FFFFFFFU   // blindly applying the rule

if (-1 < MY_INT_MAX) {
    printf("OK\n");
} else {
    printf("OOPS!\n");
}

The C rules for signed/unsigned conversions are precisely specified, but somewhat counter-intuitive so the above code will indeed print OOPS.

The MISRA-C rule is precise as it states A “U” suffix shall be applied to all constants of unsigned type. The word unsigned has far reaching consequences and indeed most constants should not really be considered unsigned.

Furthermore, the C Standard makes a subtile difference between decimal and hexadecimal constants:

A hexadecimal constant is considered unsigned if its value can be represented by the unsigned integer type and not the signed integer type of the same size for types int and larger.

This means that on 32-bit 2's complement systems, 2147483648 is a long or a long long whereas 0x80000000 is an unsigned int. Appending a U suffix may make this more explicit in this case but the real precaution to avoid potential problems is to mandate the compiler to reject signed/unsigned comparisons altogether: gcc -Wall -Wextra -Werror or clang -Weverything -Werror are life savers.

Here is how bad it can get:

if (-1 < 0x8000) {
    printf("OK\n");
} else {
    printf("OOPS!\n");
}

The above code should print OK on 32-bit systems and OOPS on 16-bit systems. To make things even worse, it is still quite common to see embedded projects use obsolete compilers which do not even implement the Standard semantics for this issue.

For your specific question, the defined values for micro-processor registers used specifically to set them via assignment (assuming these registers are memory-mapped), need not have the U suffix at all. The register lvalue should have an unsigned type and the hex value will be signed or unsigned depending on its value, but the operation will proceed the same. The opcode for setting a signed number or an unsigned number is the same on your target architecture and on any architectures I have ever seen.

Defending "U" suffix after Hex literals

Tags:

C

Microcontroller

Related

Recent Posts