how to serialize a struct in c?
The simplest way to do this may be to allocate a chunk of memory to hold everything. For instance, consider a struct as follows:
typedef struct A {
int v;
char* str;
} our_struct_t;
Now, the simplest way to do this is to create a defined format and pack it into an array of bytes. I will try to show an example:
int sLen = 0;
int tLen = 0;
char* serialized = 0;
char* metadata = 0;
char* xval = 0;
char* xstr = 0;
our_struct_t x;
x.v = 10;
x.str = "Our String";
sLen = strlen(x.str); // Assuming null-terminated (which ours is)
tLen = sizeof(int) + sLen; // Our struct has an int and a string - we want the whole string not a mem addr
serialized = malloc(sizeof(char) * (tLen + sizeof(int)); // We have an additional sizeof(int) for metadata - this will hold our string length
metadata = serialized;
xval = serialized + sizeof(int);
xstr = xval + sizeof(int);
*((int*)metadata) = sLen; // Pack our metadata
*((int*)xval) = x.v; // Our "v" value (1 int)
strncpy(xstr, x.str, sLen); // A full copy of our string
So this example copies the data into an array of size 2 * sizeof(int) + sLen
which allows us a single integer of metadata (i.e. string length) and the extracted values from the struct. To deserialize, you could imagine something as follows:
char* serialized = // Assume we have this
char* metadata = serialized;
char* yval = metadata + sizeof(int);
char* ystr = yval + sizeof(int);
our_struct_t y;
int sLen = *((int*)metadata);
y.v = *((int*)yval);
y.str = malloc((sLen + 1) * sizeof(char)); // +1 to null-terminate
strncpy(y.str, ystr, sLen);
y.str[sLen] = '\0';
As you can see, our array of bytes is well-defined. Below I have detailed the structure:
- Bytes 0-3 : Meta-data (string length)
- Bytes 4-7 : X.v (value)
- Bytes 8 - sLen : X.str (value)
This kind of well-defined structure allows you to recreate the struct on any environment if you follow the defined convention. To send this structure over the socket, now, depends on how you develop your protocol. You can first send an integer packet containing the total length of the packet which you just constructed, or you can expect that the metadata is sent first/separately (logically separately, this technically can still all be sent at the same time) and then you know how much data to receive on the client-side. For instance, if I receive metadata value of 10
then I can expect sizeof(int) + 10
bytes to follow to complete the struct. In general, this is probably 14
bytes.
EDIT
I will list some clarifications as requested in the comments.
I do a full copy of the string so it is in (logically) contiguous memory. That is, all the data in my serialized packet is actually full data - there are no pointers. This way, we can send a single buffer (we call is serialized
) over the socket. If simply send the pointer, the user receiving the pointer would expect that pointer to be a valid memory address. However, it is unlikely that your memory addresses will be exactly the same. Even if they are, however, he will not have the same data at that address as you do (except in very limited and specialized circumstances).
Hopefully this point is made more clear by looking at the deserialization process (this is on the receiver's side). Notice how I allocate a struct to hold the information sent by the sender. If the sender did not send me the full string but instead only the memory address, I could not actually reconstruct the data which was sent (even on the same machine we have two distinct virtual memory spaces which are not the same). So in essence, a pointer is only a good mapping for the originator.
Finally, as far as "structs within structs" go, you will need to have several functions for each struct. That said, it is possible that you can reuse the functions. For instance, if I have two structs A
and B
where A
contains B
, I can have two serialize methods:
char* serializeB()
{
// ... Do serialization
}
char* serializeA()
{
char* B = serializeB();
// ... Either add on to serialized version of B or do some other modifications to combine the structures
}
So you should be able to get away with a single serialization method for each struct.
This answer is besides the problems with your malloc
.
Unfortunately, you cannot find a nice trick that would still be compatible with the standard. The only way of properly serializing a structure is to separately dissect each element into bytes, write them to an unsigned char array, send them over the network and put the pieces back together on the other end. In short, you would need a lot of shifting and bitwise operations.
In certain cases you would need to define a kind of protocol. In your case for example, you need to be sure you always put the object p
is pointing to right after struct A
, so once recovered, you can set the pointer properly. Did everyone say enough already that you can't send pointers through network?
Another protocolish thing you may want to do is to write the size allocated for the flexible array member s
in struct B
. Whatever layout for your serialized data you choose, obviously both sides should respect.
It is important to note that you cannot rely on anything machine specific such as order of bytes, structure paddings or size of basic types. This means that you should serialize each field of the element separately and assign them fixed number of bytes.