Difference between Big Endian and little Endian Byte order

Ferdinand's answer (and others) are correct, but incomplete.

Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32. They existed way before Unicode, and affect how the bytes of numbers get stored in the computer's memory. They depend on the processor.

If you have a number with the value 0x12345678 then in memory it will be represented as 12 34 56 78 (BE) or 78 56 34 12 (LE).

UTF-16 and UTF-32 happen to be represented on 2 respectively 4 bytes, so the order of the bytes respects the ordering that any number follows on that platform.

Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character 0x1234 as a string of bytes (0x00-0xFF):

Byte Index:      0  1
---------------------
Big-Endian:     12 34
Little-Endian:  34 12

In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are FE, FF, the encoding is UTF-16BE. For FF, FE, it is UTF-16LE.

A visual example: The word "Example" in different encodings (UTF-16 with BOM):

Byte Index:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
------------------------------------------------------------
ASCII:       45 78 61 6d 70 6c 65
UTF-16BE:    FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65
UTF-16LE:    FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00

For further information, please read the Wikipedia page of Endianness and/or UTF-16.

UTF-16 encodes Unicode into 16-bit values. Most modern filesystems operate on 8-bit bytes. So, to save a UTF-16 encoded file to disk, for example, you have to decide which part of the 16-bit value goes in the first byte, and which goes into the second byte.

Wikipedia has a more complete explanation.

Difference between Big Endian and little Endian Byte order

Tags:

Unicode

Endianness

Utf 16

Related

Recent Posts