reinterpret_cast between char* and std::uint8_t* - safe?
Ok, let's get truly pedantic. After reading this, this and this, I'm pretty confident that I understand the intention behind both Standards.
So, doing reinterpret_cast
from std::uint8_t*
to char*
and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval].
However, doing reinterpret_cast
from char*
to std::uint8_t*
and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior if std::uint8_t
is implemented as extended unsigned integer type.
However, there are two possible workarounds, first:
static_assert(std::is_same_v<std::uint8_t, char> ||
std::is_same_v<std::uint8_t, unsigned char>,
"This library requires std::uint8_t to be implemented as char or unsigned char.");
With this assert in place, your code will not compile on platforms on which it would result in undefined behavior otherwise.
Second:
std::memcpy(uint8buffer, charbuffer, size);
Cppreference says that std::memcpy
accesses objects as arrays of unsigned char
so it is safe and portable.
To reiterate, in order to be able to reinterpret_cast
between char*
and std::uint8_t*
and work with resulting pointers portably and safely in a 100% standard-conforming way, the following conditions must be true:
CHAR_BIT == 8
.std::uint8_t
is defined.std::uint8_t
is implemented aschar
orunsigned char
.
On a practical note, the above conditions are true on 99% of platforms and there is likely no platform on which the first 2 conditions are true while the 3rd one is false.
If uint8_t
exists at all, essentially the only choice is that it's a typedef for unsigned char
(or char
if it happens to be unsigned). Nothing (but a bitfield) can represent less storage than a char
, and the only other type that can be as small as 8 bits is a bool
. The next smallest normal integer type is a short
, which must be at least 16 bits.
As such, if uint8_t
exists at all, you really only have two possibilities: you're either casting unsigned char
to unsigned char
, or casting signed char
to unsigned char
.
The former is an identity conversion, so obviously safe. The latter falls within the "special dispensation" given for accessing any other type as a sequence of char or unsigned char in §3.10/10, so it also gives defined behavior.
Since that includes both char
and unsigned char
, a cast to access it as a sequence of char also gives defined behavior.
Edit: As far as Luc's mention of extended integer types goes, I'm not sure how you'd manage to apply it to get a difference in this case. C++ refers to the C99 standard for the definitions of uint8_t
and such, so the quotes throughout the remainder of this come from C99.
§6.2.6.1/3 specifies that unsigned char
shall use a pure binary representation, with no padding bits. Padding bits are only allowed in 6.2.6.2/1, which specifically excludes unsigned char
. That section, however, describes a pure binary representation in detail -- literally to the bit. Therefore, unsigned char
and uint8_t
(if it exists) must be represented identically at the bit level.
To see a difference between the two, we have to assert that some particular bits when viewed as one would produce results different from when viewed as the other -- despite the fact that the two must have identical representations at the bit level.
To put it more directly: a difference in result between the two requires that they interpret bits differently -- despite a direct requirement that they interpret bits identically.
Even on a purely theoretical level, this appears difficult to achieve. On anything approaching a practical level, it's obviously ridiculous.