A Preliminary Exploration of Big Endian Mode and Little Endian Mode

A Preliminary Exploration of Big Endian Mode and Little Endian Mode

Origin of endian mode

There is an interesting story about the origin of the nouns big endian and little endian, from Jonathan Swift's "Gulliver's Travels": the two powerhouses of Lilliput and Blefuscu have been fighting hard for the past 36 months. The reason for the war: Everyone knows that when eating eggs, the original method is to break the larger end of the egg, but the emperor's grandfather at that time broke his finger by eating eggs when he was a child. Father, he ordered that when all the children eat eggs, they must first break the smaller end of the eggs, and those who violate the order will be severely punished. Then the people were extremely disgusted by this decree, and there were many rebellions during the period. One of the emperors was killed and the other lost his throne. The reason for the rebellion was instigated by the king minister of Blefuscu in another country. After the rebellion subsided, he fled to The empire took refuge. It is estimated that there have been more than 11,000 people who would rather die than break the eggs and eat them on several occasions. This satirized the ongoing conflict between Britain and France at the time. Danny Cohen, a pioneer of network protocols, first used these two terms to refer to byte order, which has since been widely accepted.

What are big endian and little endian

Big-Endian

The high-order byte is placed at the low address end of the memory, and the low-order byte is placed at the high address end of the memory

For example, a number 0x12345678( 12the high-order byte 78of the number and the low-order byte of the number), the representation in big endian mode is

Low address >>>>>>>>>>>>>>>>> High address

0x12 | 0x34 | 0x56 | 0x78

Little-Endian

The low-order byte is placed at the low address end of the memory, and the high-order byte is placed at the high address end of the memory

For example, the above number 0x12345678, the representation of the little endian mode is

Low address >>>>>>>>>>>>>>>>> High address

0x78 | 0x56 | 0x34 | 0x12

Example

The storage method of the 16-bit wide number 0x1234 in Little-endian mode (and Big-endian mode) in CPU memory (assuming that it is stored from address 0x4000) is:

memory address Little-endian mode to store content Big endian mode to store content
0x4000 0x34 0x12
0x4001 0x12 0x34

The storage method of the 32bit wide number 0x12345678 in Little-endian mode and Big-endian mode) in CPU memory (assuming it starts from address 0x4000) is:

memory address Little-endian mode to store content Big endian mode to store content
0x4000 0x78 0x12
0x4001 0x56 0x34
0x4002 0x34 0x56
0x4003 0x12 0x78

Advantages and disadvantages

Big-Endian mode: The sign bit is in the first byte of the memory of the represented data, which is convenient for quickly judging the positive, negative and size of the data

Little-Endian:

  • The low byte is stored at the low address of the memory, so there is no need to adjust the content of the byte when forcing the data (Note: For example, when the 4-byte int is forcibly converted into the 2-byte short, the int data is directly stored. The first two bytes are given to short, because the first two bytes are just the lowest two bytes, which is in line with the conversion logic)
  • When the CPU performs numerical operations, the data is taken from the memory in sequence from the low order to the high order, and the operation is performed until the sign bit of the highest order is finally refreshed. This operation method will be more efficient.

Why is there a difference in size?

The memory in the computer system is addressed in units of bytes, and each address unit uniquely corresponds to a byte (8 bits). This can cope with charthe storage requirements of the data type, because the chartype is exactly 1one byte, but the length of some types is multiple bytes, such as 2bytes short, 4bytes, intetc. Therefore, the question of how to arrange the storage order of each byte in multiple byte data arises here. Is the high-order byte placed in the low-address segment? Or is the low-order byte placed in the low-address segment? It is the different order of prevention that leads to the emergence of big-endian and little-endian modes.

How to determine the endianness of a machine

Since the big-endian mode and the little-endian mode have been connected, they are only two different ways of storing bytes, so if you use the program to determine the endianness of the machine, there are two ways to judge the endianness of the computer in the following program:

/*
 * 方法一
 * 通过将int类型强制转换为char类型,所以单字节的char类型变量取得了4字节的int类型变量的低地址部分,然后 
 * 通过判断低地址部分的值获取字节序。
*/
bool is_big_endian_1()
{
    int a = 0x1234;
    printf("%zd\n", sizeof(a));
    printf("%x\n", a);
    char b;
    b = *(char *)&a;
    if(0x12 == b)
        return true;
    return false;
}

/*
 * 方法二
 * 联合体union的存放顺序是所有成员都从低地址开始存放,然后获取低地址部分来判断机器字节序 
 */
bool is_big_endian_2()
{
    union NUM
    {
        int a;
        char b;
    } num;

    num.a = 0x1234;
    if(0x12 == num.b)
        return true;
    return false;
}

status quo

The general operating system is little endian, and the communication protocol is big endian.

Endianness of common CPUs

  • Big Endian : PowerPC、IBM、Sun
  • Little Endian : x86、DEC
  • ARM can work in either big endian or little endian mode.

Endianness of common files

  • Adobe PS – Big Endian
  • BMP – Little Endian
  • DXF(AutoCAD) – Variable
  • GIF – Little Endian
  • JPEG – Big Endian
  • MacPaint – Big Endian
  • RTF – Little Endian

Note : Java and all network communication protocols use Big-Endian encoding.

Host byte order and network byte order

host byte order

Different CPUs have different byte orders, and these byte orders determine the order in which data is stored in memory, that is, the host byte order. There are two most common host byte order, which are mentioned above: big endian mode and little endian mode.

network byte order

Network byte order is TCP/IPa data representation format specified in , which has nothing to do with specific CPU types and operating systems, so that data can be correctly interpreted when transmitted between different hosts. The network byte order adopts big endian mode.

When doing network programming under Linux, two functions are often used , htonsand htonlthey are to convert the host byte order to the network byte order.

endian conversion

endian conversion

// 实现16bit的数据之间的大小端转换
#define BLSWITCH16(A)   (  ( ( (uint16)(A) & 0xff00 ) >> 8  )    | \  
                           ( ( (uint16)(A) & 0x00ff ) << 8  )     )  

// 实现32bit的数据之间的大小端转换
#define BLSWITCH32(A)   (  ( ( (uint32)(A) & 0xff000000) >> 24) |\
         (((uint32)(A) & 0x00ff0000) >> 8) | \
         (((unit32)(A) & 0x0000ff00) << 8) | \
         (((uint32)(A) & 0x000000ff) << 32)  )

Network byte order and host byte order conversion

Since the network byte order is always big endian, and most personal PCs are in the little endian mode of X86, it is inevitable to convert between network byte order and host byte order in network programming. The following is Conversion functions provided by socket

#define ntohs(n)     // 16位数据类型网络字节顺序到主机字节顺序的转换  
#define htons(n)     // 16位数据类型主机字节顺序到网络字节顺序的转换  
#define ntohl(n)     // 32位数据类型网络字节顺序到主机字节顺序的转换  
#define htonl(n)     // 32位数据类型主机字节顺序到网络字节顺序的转换

refer to

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325629274&siteId=291194637