Is there a way to enforce specific endianness for a C or C++ struct?
A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.
The following test program:
#include <stdio.h>
#include <stdint.h>
struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
uint16_t a;
uint32_t b;
uint64_t c;
};
int main(int argc, char** argv) {
struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};
FILE *f = fopen("out.bin", "wb");
size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
fclose(f);
}
creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.
The way I usually handle this is like so:
#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>
class be_uint16_t {
public:
be_uint16_t() : be_val_(0) {
}
// Transparently cast from uint16_t
be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
}
// Transparently cast to uint16_t
operator uint16_t() const {
return ntohs(be_val_);
}
private:
uint16_t be_val_;
} __attribute__((packed));
Similarly for be_uint32_t
.
Then you can define your struct like this:
struct be_fixed64_t {
be_uint32_t int_part;
be_uint32_t frac_part;
} __attribute__((packed));
The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t
object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:
be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form
In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.
With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:
be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast
...but for most code, you can just use them as if they were built-in types.