A Preliminary Exploration of Big Endian Mode and Little Endian Mode
Origin of endian mode
There is an interesting story about the origin of the nouns big endian and little endian, from Jonathan Swift's "Gulliver's Travels": the two powerhouses of Lilliput and Blefuscu have been fighting hard for the past 36 months. The reason for the war: Everyone knows that when eating eggs, the original method is to break the larger end of the egg, but the emperor's grandfather at that time broke his finger by eating eggs when he was a child. Father, he ordered that when all the children eat eggs, they must first break the smaller end of the eggs, and those who violate the order will be severely punished. Then the people were extremely disgusted by this decree, and there were many rebellions during the period. One of the emperors was killed and the other lost his throne. The reason for the rebellion was instigated by the king minister of Blefuscu in another country. After the rebellion subsided, he fled to The empire took refuge. It is estimated that there have been more than 11,000 people who would rather die than break the eggs and eat them on several occasions. This satirized the ongoing conflict between Britain and France at the time. Danny Cohen, a pioneer of network protocols, first used these two terms to refer to byte order, which has since been widely accepted.
What are big endian and little endian
Big-Endian
The high-order byte is placed at the low address end of the memory, and the low-order byte is placed at the high address end of the memory
For example, a number 0x12345678
( 12
the high-order byte 78
of the number and the low-order byte of the number), the representation in big endian mode is
Low address >>>>>>>>>>>>>>>>> High address
0x12 | 0x34 | 0x56 | 0x78
Little-Endian
The low-order byte is placed at the low address end of the memory, and the high-order byte is placed at the high address end of the memory
For example, the above number 0x12345678
, the representation of the little endian mode is
Low address >>>>>>>>>>>>>>>>> High address
0x78 | 0x56 | 0x34 | 0x12
Example
The storage method of the 16-bit wide number 0x1234 in Little-endian mode (and Big-endian mode) in CPU memory (assuming that it is stored from address 0x4000) is:
memory address | Little-endian mode to store content | Big endian mode to store content |
---|---|---|
0x4000 | 0x34 | 0x12 |
0x4001 | 0x12 | 0x34 |
The storage method of the 32bit wide number 0x12345678 in Little-endian mode and Big-endian mode) in CPU memory (assuming it starts from address 0x4000) is:
memory address | Little-endian mode to store content | Big endian mode to store content |
---|---|---|
0x4000 | 0x78 | 0x12 |
0x4001 | 0x56 | 0x34 |
0x4002 | 0x34 | 0x56 |
0x4003 | 0x12 | 0x78 |
Advantages and disadvantages
Big-Endian mode: The sign bit is in the first byte of the memory of the represented data, which is convenient for quickly judging the positive, negative and size of the data
Little-Endian:
- The low byte is stored at the low address of the memory, so there is no need to adjust the content of the byte when forcing the data (Note: For example, when the 4-byte int is forcibly converted into the 2-byte short, the int data is directly stored. The first two bytes are given to short, because the first two bytes are just the lowest two bytes, which is in line with the conversion logic)
- When the CPU performs numerical operations, the data is taken from the memory in sequence from the low order to the high order, and the operation is performed until the sign bit of the highest order is finally refreshed. This operation method will be more efficient.
Why is there a difference in size?
The memory in the computer system is addressed in units of bytes, and each address unit uniquely corresponds to a byte (8 bits). This can cope with char
the storage requirements of the data type, because the char
type is exactly 1
one byte, but the length of some types is multiple bytes, such as 2
bytes short
, 4
bytes, int
etc. Therefore, the question of how to arrange the storage order of each byte in multiple byte data arises here. Is the high-order byte placed in the low-address segment? Or is the low-order byte placed in the low-address segment? It is the different order of prevention that leads to the emergence of big-endian and little-endian modes.
How to determine the endianness of a machine
Since the big-endian mode and the little-endian mode have been connected, they are only two different ways of storing bytes, so if you use the program to determine the endianness of the machine, there are two ways to judge the endianness of the computer in the following program:
/*
* 方法一
* 通过将int类型强制转换为char类型,所以单字节的char类型变量取得了4字节的int类型变量的低地址部分,然后
* 通过判断低地址部分的值获取字节序。
*/
bool is_big_endian_1()
{
int a = 0x1234;
printf("%zd\n", sizeof(a));
printf("%x\n", a);
char b;
b = *(char *)&a;
if(0x12 == b)
return true;
return false;
}
/*
* 方法二
* 联合体union的存放顺序是所有成员都从低地址开始存放,然后获取低地址部分来判断机器字节序
*/
bool is_big_endian_2()
{
union NUM
{
int a;
char b;
} num;
num.a = 0x1234;
if(0x12 == num.b)
return true;
return false;
}
status quo
The general operating system is little endian, and the communication protocol is big endian.
Endianness of common CPUs
- Big Endian : PowerPC、IBM、Sun
- Little Endian : x86、DEC
- ARM can work in either big endian or little endian mode.
Endianness of common files
- Adobe PS – Big Endian
- BMP – Little Endian
- DXF(AutoCAD) – Variable
- GIF – Little Endian
- JPEG – Big Endian
- MacPaint – Big Endian
- RTF – Little Endian
Note : Java and all network communication protocols use Big-Endian encoding.
Host byte order and network byte order
host byte order
Different CPUs have different byte orders, and these byte orders determine the order in which data is stored in memory, that is, the host byte order. There are two most common host byte order, which are mentioned above: big endian mode and little endian mode.
network byte order
Network byte order is TCP/IP
a data representation format specified in , which has nothing to do with specific CPU types and operating systems, so that data can be correctly interpreted when transmitted between different hosts. The network byte order adopts big endian mode.
When doing network programming under Linux, two functions are often used , htons
and htonl
they are to convert the host byte order to the network byte order.
endian conversion
endian conversion
// 实现16bit的数据之间的大小端转换
#define BLSWITCH16(A) ( ( ( (uint16)(A) & 0xff00 ) >> 8 ) | \
( ( (uint16)(A) & 0x00ff ) << 8 ) )
// 实现32bit的数据之间的大小端转换
#define BLSWITCH32(A) ( ( ( (uint32)(A) & 0xff000000) >> 24) |\
(((uint32)(A) & 0x00ff0000) >> 8) | \
(((unit32)(A) & 0x0000ff00) << 8) | \
(((uint32)(A) & 0x000000ff) << 32) )
Network byte order and host byte order conversion
Since the network byte order is always big endian, and most personal PCs are in the little endian mode of X86, it is inevitable to convert between network byte order and host byte order in network programming. The following is Conversion functions provided by socket
#define ntohs(n) // 16位数据类型网络字节顺序到主机字节顺序的转换
#define htons(n) // 16位数据类型主机字节顺序到网络字节顺序的转换
#define ntohl(n) // 32位数据类型网络字节顺序到主机字节顺序的转换
#define htonl(n) // 32位数据类型主机字节顺序到网络字节顺序的转换