Byte order (big endian & little endian)

1. Function

硬件的数据存储There are two computer methods: big endian and little endian.

There are two reasons for endianness:

Big endian: It is convenient for humans to read and write, so it can be used for network transmission and file storage.
Little-endian byte order: The computer starts reading from the low address and counts from the low-end byte. Therefore, the little-endian byte order is convenient for the computer to do calculations.

Big-endian byte order: the high-order byte is stored in the low address of the memory
Little-endian byte order: the high-order byte is stored in the high address of the memory

0x1234567The big-endian and little-endian of is written as shown in the figure below:

Timing of judgment

Only when reading, the endianness needs to be distinguished.

When the CPU reads external data, it must know the endianness of the data before it can convert the data into the correct value. It can be used after reading the correct value, no need to consider the endianness. When writing data to an external device, there is no need to distinguish the endianness (note that you don’t need to set the endianness when writing, but you don’t need to judge which endianness is actually used, just store it in the way you want. ), the external device handles the endianness problem by itself. That 谁读取（也即谁使用），谁区分is, .
Way of judgment

As defined in the Unicode encoding specification, a character representing the encoding sequence is added to the front of each file. The name of this character is called " zero-width non-newline space " ( Zero Width No-Break Space), and its encoding is FEFF, which is a character that does not exist in UCS . The UCS specification recommends that we transmit the character "Zero Width No-Break Space" before transmitting the byte stream, which is exactly 2 bytes, and FF is 1 larger than FE. If the first 2 bytes of a text file are FE FF, it means that the file adopts the big end mode; if the first 2 bytes are FF FE, it means that the file adopts the small end mode. The character "Zero Width No-Break Space" is also called BOM（Byte Order Mark，字节序标记）.

network transmission

UDP/TCP/IPThe protocol stipulates: treat the received 第一个字节as the 高位字节high byte. This requires the first byte sent by the sender to be the high byte. When sending data at the sender, the first byte sent is the byte corresponding to the starting address of the value in the memory, that is, the word corresponding to the starting address of the value in the memory The section is the first high byte to be sent (ie: the high byte is stored at the low address). It can be seen that before sending the multi-byte value, it should be stored in big endian in the memory, ie 网络字节序是大端字节序. For example, when we send the integer value 0x12345678 through the network, it is stored in small-endian on the 80X86 platform. Before sending, it is necessary to use the byte order conversion function htonl() provided by the system to convert it into big-endian storage. The numerical value.
CPU storage
- Big Endian: PowerPC, IBM, Sun (java virtual machine default), ARM
- Little Endian: x86, DEC, ARM (default)

https://www.zhihu.com/question/20152853/answer/95576659 For character encoding, what aspects should programmers know about it? - Know almost

https://www.cnblogs.com/gremount/p/8830707.html Understanding byte order big-endian and little-endian-博客园

https://www.cnblogs.com/langzou/p/9010899.html The origin of big endian and little endian-博客园