What is an unsigned char?
Because I feel it's really called for, I just want to state some rules of C and C++ (they are the same in this regard). First, all bits of unsigned char
participate in determining the value if any unsigned char object. Second, unsigned char
is explicitly stated unsigned.
Now, I had a discussion with someone about what happens when you convert the value -1
of type int to unsigned char
. He refused the idea that the resulting unsigned char
has all its bits set to 1, because he was worried about sign representation. But he didn't have to be. It's immediately following out of this rule that the conversion does what is intended:
If the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. (
6.3.1.3p2
in a C99 draft)
That's a mathematical description. C++ describes it in terms of modulo calculus, which yields to the same rule. Anyway, what is not guaranteed is that all bits in the integer -1
are one before the conversion. So, what do we have so we can claim that the resulting unsigned char
has all its CHAR_BIT
bits turned to 1?
- All bits participate in determining its value - that is, no padding bits occur in the object.
- Adding only one time
UCHAR_MAX+1
to-1
will yield a value in range, namelyUCHAR_MAX
That's enough, actually! So whenever you want to have an unsigned char
having all its bits one, you do
unsigned char c = (unsigned char)-1;
It also follows that a conversion is not just truncating higher order bits. The fortunate event for two's complement is that it is just a truncation there, but the same isn't necessarily true for other sign representations.
This is implementation dependent, as the C standard does NOT define the signed-ness of char
. Depending on the platform, char may be signed
or unsigned
, so you need to explicitly ask for signed char
or unsigned char
if your implementation depends on it. Just use char
if you intend to represent characters from strings, as this will match what your platform puts in the string.
The difference between signed char
and unsigned char
is as you'd expect. On most platforms, signed char
will be an 8-bit two's complement number ranging from -128
to 127
, and unsigned char
will be an 8-bit unsigned integer (0
to 255
). Note the standard does NOT require that char
types have 8 bits, only that sizeof(char)
return 1
. You can get at the number of bits in a char with CHAR_BIT
in limits.h
. There are few if any platforms today where this will be something other than 8
, though.
There is a nice summary of this issue here.
As others have mentioned since I posted this, you're better off using int8_t
and uint8_t
if you really want to represent small integers.
In C++, there are three distinct character types:
char
signed char
unsigned char
If you are using character types for text, use the unqualified char
:
- it is the type of character literals like
'a'
or'0'
(in C++ only, in C their type isint
) - it is the type that makes up C strings like
"abcde"
It also works out as a number value, but it is unspecified whether that value is treated as signed or unsigned. Beware character comparisons through inequalities - although if you limit yourself to ASCII (0-127) you're just about safe.
If you are using character types as numbers, use:
signed char
, which gives you at least the -127 to 127 range. (-128 to 127 is common)unsigned char
, which gives you at least the 0 to 255 range.
"At least", because the C++ standard only gives the minimum range of values that each numeric type is required to cover. sizeof (char)
is required to be 1 (i.e. one byte), but a byte could in theory be for example 32 bits. sizeof
would still be report its size as 1
- meaning that you could have sizeof (char) == sizeof (long) == 1
.