Delimiting binary sequences
You've got five options:
- Use a delimiter character that is unlikely to occur. This runs the risk of you guessing incorrectly. I don't recommend this approach.
- Use a delimiter character and an escape sequence to include the delimiter. You may need to double the escape character, depending upon what makes for easier parsing. (Think of the C
\0
to include an ASCII NUL in some content.) - Use a delimiter phrase that you can determine does not occur. (Think of the mime message boundaries.)
- Prepend a length field of some sort, so you know to read the following N bytes as data. This has the downside of requiring you to know this length before writing the data, which is sometimes difficult or impossible.
- Use something far more complicated, like ASN.1, to completely describe all your content for you. (I don't know if I'd actually recommend this unless you can make good use of it -- ASN.1 is awkward to use in the best of circumstances, but it does allow completely unambiguous binary data interpretation.)
Usually, you wrap your binary data in a well known format, for example with a fixed header that describes the subsequent data. If you are trying to find delimeters in an unknown stream of data, usually you need an escape sequence. For example, something like HDLC, where 0x7E is the frame delimeter. Data must be encoded such that if there is 0x7E inside the data, it is replaced with 0x7D followed by an XOR of the original data. 0x7D in the data stream is similarly escaped.