reinterpret_cast between char* and std::uint8_t* - safe?

Ok, let's get truly pedantic. After reading this, this and this, I'm pretty confident that I understand the intention behind both Standards.

So, doing reinterpret_cast from std::uint8_t* to char* and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval].

However, doing reinterpret_cast from char* to std::uint8_t* and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior if std::uint8_t is implemented as extended unsigned integer type.

However, there are two possible workarounds, first:

static_assert(std::is_same_v<std::uint8_t, char> ||
    std::is_same_v<std::uint8_t, unsigned char>,
    "This library requires std::uint8_t to be implemented as char or unsigned char.");

With this assert in place, your code will not compile on platforms on which it would result in undefined behavior otherwise.

Second:

std::memcpy(uint8buffer, charbuffer, size);

Cppreference says that std::memcpy accesses objects as arrays of unsigned char so it is safe and portable.

To reiterate, in order to be able to reinterpret_cast between char* and std::uint8_t* and work with resulting pointers portably and safely in a 100% standard-conforming way, the following conditions must be true:

CHAR_BIT == 8.
std::uint8_t is defined.
std::uint8_t is implemented as char or unsigned char.

On a practical note, the above conditions are true on 99% of platforms and there is likely no platform on which the first 2 conditions are true while the 3rd one is false.

If uint8_t exists at all, essentially the only choice is that it's a typedef for unsigned char (or char if it happens to be unsigned). Nothing (but a bitfield) can represent less storage than a char, and the only other type that can be as small as 8 bits is a bool. The next smallest normal integer type is a short, which must be at least 16 bits.

As such, if uint8_t exists at all, you really only have two possibilities: you're either casting unsigned char to unsigned char, or casting signed char to unsigned char.

The former is an identity conversion, so obviously safe. The latter falls within the "special dispensation" given for accessing any other type as a sequence of char or unsigned char in §3.10/10, so it also gives defined behavior.

Since that includes both char and unsigned char, a cast to access it as a sequence of char also gives defined behavior.

Edit: As far as Luc's mention of extended integer types goes, I'm not sure how you'd manage to apply it to get a difference in this case. C++ refers to the C99 standard for the definitions of uint8_t and such, so the quotes throughout the remainder of this come from C99.

§6.2.6.1/3 specifies that unsigned char shall use a pure binary representation, with no padding bits. Padding bits are only allowed in 6.2.6.2/1, which specifically excludes unsigned char. That section, however, describes a pure binary representation in detail -- literally to the bit. Therefore, unsigned char and uint8_t (if it exists) must be represented identically at the bit level.

To see a difference between the two, we have to assert that some particular bits when viewed as one would produce results different from when viewed as the other -- despite the fact that the two must have identical representations at the bit level.

To put it more directly: a difference in result between the two requires that they interpret bits differently -- despite a direct requirement that they interpret bits identically.

Even on a purely theoretical level, this appears difficult to achieve. On anything approaching a practical level, it's obviously ridiculous.

reinterpret_cast between char* and std::uint8_t* - safe?

Tags:

C++

C++11

Language Lawyer

Strict Aliasing

Uint8T

Related

Recent Posts