Is it better to use char or unsigned char array for storing raw data?
As far as the structure of the buffer is concerned, there is no difference: in both cases you get an element size of one byte, mandated by the standard.
Perhaps the most important difference that you get is the behavior that you see when accessing the individual elements of the buffer, for example, for printing. With char
you get implementation-defined signed or unsigned behavior; with unsigned char
you always see unsigned behavior. This becomes important if you want to print the individual bytes of your "raw data" buffer.
Another good alternative for use for buffers is the exact-width integer uint8_t
. It is guaranteed to have the same width as unsigned char
, its name requires less typing, and it tells the reader that you are not intended to use the individual elements of the buffer as character-based information.
Internally, it is exactly the same: Each element is a byte. The difference is given when you operate with those values.
If your values range is [0,255] you should use unsigned char
but if it is [-128,127] then you should use signed char
.
Suppose you are use the first range (signed char
), then you can perform the operation 100+100
. Otherwise that operation will overflow and give you an unexpected value.
Depending on your compiler or machine type, char
may be unsigned or signed by default:
Is char signed or unsigned by default?
Thus having char
the ranges described for the cases above.
If you are using this buffer just to store binary data without operating with it, there is no difference between using char
or unsigned char
.
EDIT
Note that you can even change the default char
for the same machine and compiler using compiler's flags:
-funsigned-char Let the type char be unsigned, like unsigned char.
Each kind of machine has a default for what char should be. It is either likeunsigned char by default or like signed char by default. Ideally, a portable program should always use signed char or unsigned char when it depends on the signedness of an object. But many programs have been written to use plain char and expect it to be signed, or expect it to be unsigned, depending on the machines they were written for. This option, and its inverse, let you make such a program work with the opposite default.
The type char is always a distinct type from each of signed char or unsigned char, even though its behavior is always just like one of those two.
UPDATE: C++17 introduced std::byte
, which is more suited to "raw" data buffers than using any manner of char
.
For earlier C++ versions:
unsigned char
emphasises that the data is not "just" textif you've got what's effectively "byte" data from e.g. a compressed stream, a database table backup file, an executable image, a jpeg... then
unsigned
is appropriate for the binary-data connotation mentioned aboveunsigned
works better for some of the operations you might want to do on binary data, e.g. there are undefined and implementation defined behaviours for some bit operations on signed types, andunsigned
values can be used directly as indices in arraysyou can't accidentally pass an
unsigned char*
to a function expectingchar*
and have it operated on as presumed textin these situations it's usually more natural to think of the values as being in the range 0..255, after all - why should the "sign" bit have a different kind of significance to the other bits in the data?
if you're storing "raw data" that - at an application logic/design level happens to be 8-bit numeric data, then by all means choose either
unsigned
or explicitlysigned
char
as appropriate to your needs