Big Endian and Little Endian in detail

1. The origin of Endian

In various computer architectures, the storage mechanisms for bytes and words are different, which causes a very important problem in the field of computer communication, that is, the information unit (bit, byte, word, double Words, etc.) in what order should be transmitted. If no agreed rules are reached, the two parties in communication will not be able to perform correct encoding/decoding, resulting in communication failure.

In 1980, Danny Cohen cited this term in his famous paper "On Holy Wars and a Plea for Peace" in order to quell a debate about the order in which bytes should be transmitted in a message. In this article, Cohen aptly calls the group that supports transmission from the highest bit of a message sequence Big-Endians, and the corresponding group that supports transmission from the lowest bit is called Little-Endians. Since then, the word Endian has been widely adopted along with this paper.

Second, the byte order of Little-Endian&Big-Endian

First of all, to be clear, the smallest physical unit we come into contact with is bytes; therefore, whether it is big endian or little endian, it is all for a sequence of multiple bytes; of course, in the field of communication, this is often It is bit, but the principle is similar, I will introduce it later.

For the storage format of byte sequences, there are currently two camps, namely Motorola's PowerPC series CPUs and Intel's x86 series CPUs. The PowerPC series uses the big endian method to store data, while the x86 series uses the little endian method to store data. So what is big endian and what is little endian?

1) Little-endian: Store low-order bytes at the starting address (low-order addressing)
2) Big-endian: Store high-order bytes at the starting address (high-order addressing)

For example:
if we write 0x1234abcd into the memory starting with 0x0000, the result will be;

address big-endian little-endian
0x0000 0x12 0xcd
0x0001 0x34 0xab
0x0002 0xab 0x34
0x0003 0xcd 0x12

Note: Each address stores 1 byte, and a 2-digit hexadecimal number is 1 byte (0xFF=11111111) ;

Why pay attention to endianness? You may ask that. Of course, if the program you write only runs in a stand-alone environment and does not interact with other people's programs, then you can completely ignore the existence of endianness.

But what if your program needs to interact with someone else's program? I want to speak two languages ​​here. The order of data storage in a program written in C/C++ language is related to the CPU where the compilation platform is located, while a program written in JAVA only uses the big endian method to store data.

Just imagine, if you use C/C++ language on the x86 platform to write a program and other people's JAVA program intercommunicate what will be the result? Take the above 0x12345678 as an example, a data that your program passes to others, and a pointer to 0x12345678 is passed to the JAVA program. Since JAVA adopts the big endian way to store data, it will naturally translate your data to 0x78563412 . what? Has it become another number? Yes, this is the consequence. Therefore, it is necessary to perform endian conversion before transferring your C program to the JAVA program.

Coincidentally, all network protocols also use the big endian way to transmit data. So sometimes we also call the big endian way the network byte order. When two hosts with different byte order communicate, they must be converted into network byte order before sending data before transmitting.

At present, little endian should be the mainstream, because there is no need to consider address issues when data type conversion (especially pointer conversion).

Third, the bit order of Little-Endian&Big-Endian

But some friends still ask, is there any difference between big endian and little endian in the order of the 8 bits in the byte when the CPU stores a byte of data? Or is there a difference in bit order?

In fact, this bit sequence also exists. The following figure illustrates with the number 0xB4 (10110100).

MSB means: the full name is Most Significant Bit, which belongs to the most significant bit in a binary number, and the MSB is the highest weighted bit, similar to the leftmost bit in a decimal number.

LSB means: the full name is Least Significant Bit, which means the least significant bit in a binary number. Generally speaking, the MSB is located at the far left of the binary number, and the LSB is located at the far right of the binary number.

  • Big Endian
   msb------------------------>lsb
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   1  |   0  |   1  |   1  |   0  |   1  |   0  |   0  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  •  
  • Little Endian
   lsb-------------------------->msb
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   0  |   0  |   1  |   0  |   1  |   1  |   0  |   1  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  •  

In fact, since the smallest unit of CPU storage data operation is a byte, the internal bit order is a black box for our program. In other words, you give me a pointer to the number 0xB4. For a big endian CPU, it reads 8 bits of this number from left to right; for a little endian CPU, On the contrary, the 8 bits of this number are read sequentially from right to left. The number obtained by our program after accessing through this pointer is 0xB4. The bit sequence inside the byte is invisible to the program. In fact, this is the same for the byte sequence on a stand-alone computer.

Then some people may ask again, what if it is network transmission? Will there be a problem? Do I need to use some function to convert the bit order? Well, this question is very good. Assuming that a little endian CPU wants to pass a byte to a big endian CPU, it will read the 8-bit number locally before transmission, and then transmit these 8 bits in the order of network byte order. In this case, there will be no problems at the receiving end. And if you want to transmit a 32-bit number, because this number occupies 4 bytes when stored on the littel endian side, and the network transmission is carried out in bytes, the CPU of the little endian side reads the first word Sent after the festival, in fact, this byte is the LSB of the original number, but it becomes the MSB at the receiver, which leads to confusion.

Guess you like

Origin blog.csdn.net/yundanfengqing_nuc/article/details/110474591