Analysis of the problem of large and small endian and byte alignment in IM development

Big and small

What is the big endian?

In different computer architectures, the storage and transmission mechanisms for data (bits, bytes, words, etc.) are different; at present, there are two main byte storage mechanisms used in computers of various architectures: Big-endian and Little-endian.
Endianness, also known as endianness, endianness, English: Endianness. In the field of computer science, endianness refers to the order of bytes (byte) storing multi-byte data. The typical situation is the storage method of integers in memory (little endian) and the transmission order of network transmission (big endian) . Endianness can also be used in bit order.

Original: http://smilejay.com/2012/01/big-endian_little-endian/

  • Little-Endian: Common operating systems are little-endian storage , and various programming languages ​​are also little-endian stored in memory, such as C/C++. The characteristic is: the high byte of data is in the low address, and the low byte of data is in the high address. That is, the address increases from small to large, and the data is put from high to low.
  • Big-Endian: Network byte order is big-endian storage . The characteristic is: the high byte of the data is in the high address, and the low byte is in the low address
    Insert picture description here

In which scenarios will you encounter big and small problems?

C/C++ network programming

Let's look at a more specific example (we just need to remember that integers are stored as little-endian in the common x86 cpu architecture, and C/C++ needs to be converted to big-endian when sent to the network).

In vs 2017, let's take a look at what uint32_t (unsigned integer 32-bit 4 bytes) looks like in memory:
Insert picture description here

This string: 07 00 00 00 , so it is little-endian mode.

What if converted to big endian?
Insert picture description here

It becomes: 00 00 00 07 .

What is the difference between the two pictures before and after? The former is to directly print the memory address of uint32_t, and the latter is to indirectly call htonl to convert the unsigned long type from host order (little endian under x86 cpu) to network byte order (big endian) through the AppendInt32 function of the Buffer class of evpp .

The details of the two conversion functions in evpp (C++ cross-platform network library) are as follows:

namespace evpp {
    
    
class EVPP_EXPORT Buffer {
    
    
public:
    // ...

    // 往缓冲区中追加一个有符号32位整数
    void AppendInt32(int32_t x) {
    
    
        int32_t be32 = htonl(x); // htonl,转换成网络字节序(大端)后再写入内存
        Write(&be32, sizeof be32);
    }
	
    // ...

    // 取的时候,调用ntohl再从大端转换回来
    int32_t PeekInt32() const {
    
    
        assert(length() >= sizeof(int32_t));
        int32_t be32 = 0;
        ::memcpy(&be32, data(), sizeof be32);
        return ntohl(be32);
    }

PS:
There are the following four commonly used conversion functions in C/C++. These four functions take effect in the little endian system. The big endian system does not need to be converted because it has the same byte order as the network.
htons —— Convert unsigned short type from host order to network byte order
ntohs —— Convert unsigned short type from network byte order to host order
htonl —— Convert unsigned long type from host order to network byte order
ntohl — — Convert unsigned long type from network byte order to host order
When using the above function, you need to include the corresponding header file, as follows:

#if defined(linux) || defined(__linux__)
	#include <netinet/in.h>
#endif
 
#ifdef WIN32
	#include <WINSOCK2.H>
#endif

Big and small endian in Go

As mentioned above: Generally, common operating systems are little-endian, while network byte order is big-endian . Generally, the application layer protocol of TCP-based transmission will adopt the TLV format, that is, the fixed byte storage length and message type of the header, such as:

type ImHeader struct {
    
    
	Length    uint32 // the whole pdu length
	Version   uint16 // pdu version number
	Flag      uint16 // not used
	ServiceId uint16 //
	CommandId uint16 //
	SeqNum    uint16 // 包序号
	Reversed  uint16 // 保留

	pbMessage proto.Message // 消息体
}

Like the header above, 16 bytes are fixed. The first 4 bytes store the length. Assuming that my server is developed with go , what will be the consequences if I analyze it according to the big-endian?
Insert picture description here

The first 67 lines in accordance with the large end of parsing the client had sent a number 7 (the length of the data representing the portion 7) , the side line 66 parsed naturally not right, into 117,440,512. Therefore, there is no way to get the data in the data department correctly later, and it can only be judged as a bad packet.

Let's look at memory again:
Insert picture description here

We see the beginning: 07 00 00 00, which is indeed little endian , and the correct big endian should be: 00 00 00 07.

Big and small endian in Java

Quote:

All binary files in JAVA are stored as big-endian , and this storage method is also called network order. That is to say, running JAVA on all platforms, such as Mac, PC, UNIX, etc., does not need to consider the problem of large and small end. The trouble is that programs developed in different languages ​​exchange data. For example, in the author's recent project, the binary file is generated by C and sent in Json format through the redis message channel. The C language defaults to little-endian mode, which involves both big and small endian. Conversion. Some platforms (such as Mac, IBM 390) have built-in big-endian mode, and some other platforms have built-in little-endian mode (such as Intel). JAVA helps you shield the difference in byte order of each platform.

Therefore, Java generally does not need to deal with the problem of large and small ends, unless it needs to communicate with other languages ​​such as C/C++.

in conclusion

  • Big endian: network byte order, Java.
  • Little-endian: Generally, common operating systems are little-endian, and C/C++ is also little-endian storage, and Go can specify the big-endian.

The virtual machine in Java shields the problem of large and small ends, and there is no need to consider if it is communication between Java.

Reference materials:

Byte alignment

What is byte alignment?

To quote a sentence from Baidu Encyclopedia:

The memory space in modern computers is divided according to bytes. In theory, it seems that access to any type of variable can start from any address, but the actual situation is that when accessing a specific type of variable, it is often accessed at a specific memory address. This It is necessary for various types of data to be arranged in space according to certain rules, rather than sequentially arranged one after another. This is alignment

Let's look at two more concepts below to deepen your understanding.

Memory access granularity

definition

Usually in the eyes of C/C++ programmers, char* ubiquitously means "a piece of memory", and even Java has a byte[] type that represents raw memory. Is it really that simple?

Memory in the eyes of programmers:
Insert picture description here

Memory in the eyes of the CPU:
Insert picture description here

The computer's processor does not read and write memory in a single byte. Instead, the memory is accessed in blocks of 2, 4, 8, 16, or even 32 bytes (in order to improve performance). We call the size of the processor's access to memory as the memory access granularity .

So, there is a question. Suppose I define a structure with a size of 5 (except 2)? Let's read the basics of alignment first, and there will be answers in the next section: Structure Alignment.

Alignment basis

Single byte memory access granularity

Insert picture description here

This is in line with the general programmer's perception of the memory working model: reading from address 0 is no different from reading from address 1. Now look at what happens on a processor with a two-byte granularity, such as the original 68000;

Double-byte memory access granularity

Insert picture description here
When reading from address 0, the number of memory accesses occupied by a processor with a 2-byte granularity is half that of a processor with a 1-byte granularity. Because each memory access requires a fixed overhead, minimizing the number of accesses can indeed improve performance.

However, please pay attention to what happens when you read from address 1. Since the address does not fall evenly on the processor's memory access boundary, the processor still has a lot of work to do. Such addresses are called unaligned addresses (so the #pragma pack instruction will slow down the program?). Because address 1 is unaligned, a processor with a two-byte granularity must perform additional memory accesses, which slows down the operation .

Finally, check what happens on a processor with a four-byte memory access granularity, such as 68030 or PowerPC®601;

Four-byte memory access granularity

Insert picture description here

A processor with a four-byte granularity can extract four bytes from an aligned address in one read. Also note that reading from an unaligned address will double the access count.

It is precisely because the memory access of the processor is not in accordance with our understanding of 1 byte 1 byte read, so the compiler is bound to do something for us to optimize performance. Next, let's take a look at the structure alignment and understand some of the work done by the compiler behind the scenes.

Structure alignment

This time, I will experiment with macbook pro 2017.

An innocent structure:

struct Header {
    
    
    char a; // 1个字节
    int b;  // 4个字节
    char c; // 1个字节
};

What is the size (in bytes) of this structure? Intuitively, it should be 1 + 4 + 1 equal to 6. Is it true? Let's take a look at its layout in memory:
Insert picture description here

The picture on the right is the memory layout of the clang compiler in macos. We found that the real size is not 6 bytes, but aligned according to the largest value in the structure (int=4 bytes), that is, 3*4=12 . Then redundant. We use clion to debug (floating on the variable -> right click -> Show in Memory View) to confirm:
Insert picture description here

It is true, it will cause 2 problems:

  • The memory becomes large and wasteful, 6 bytes are useless
  • Misleading, it is very likely that inconsistent understanding between the two parties during network transmission will lead to different analytical results.

What to do then?

  1. [Recommended practice] Adjust the order of the fields, and manually fill in the fields that are less than the maximum value of the structure to avoid the compiler from automatically filling in and causing potential problems.
struct Header {
    
    
    int b;  // 4字节,放在最前面
    char a; // 1个字节
    char c; // 1个字节,还没满4*2的倍数=8,怎么办?
    int16_t reserved;// 我来显示定义一个保留位,占2字节
};
  1. Force the compiler to align with 1 byte
#pragma pack(1)

Insert picture description here

Why consider byte alignment

Quote :

If you do not understand and solve the alignment problem in the software, the following conditions are possible from high to low severity:

  • Your software will slow down.
  • Your application will be locked.
  • Your operating system will crash.
  • Your software will fail silently, producing incorrect results.

To put it simply, incorrect alignment will cause memory waste, performance degradation, and even crashes caused by parsing errors.

in conclusion

verified:

  1. The self-alignment value of a structure or class: the value with the largest self-alignment value among its members.
  2. Specify the alignment value: #pragma pack (value) specifies the alignment value value.

Unverified:

  1. The alignment value of the data type itself:
      for char type data, its alignment value is 1, for short type it is 2, for int, float type, its alignment value is 4, for double type, its alignment value is 8, unit byte.
  2. Valid alignment values ​​of data members, structures, and classes: the smaller of the self-alignment value and the specified alignment value.

The correct approach, example 1:

struct Header {
    
    
	int b;  // 4个字节
	char a; // 1个字节
	char c; // 1个字节
	int16_t reserved; // 2个字节,对齐
};

Example 2 (So, when we customize the protocol, especially the field types in the business header can not be set arbitrarily, but should be placed according to the integer multiple of the largest value):

type ImHeader struct {
    
    
	Length    uint32 // the whole pdu length
	Version   uint16 // pdu version number
	Flag      uint16 // not used
	ServiceId uint16 //
	CommandId uint16 //
	SeqNum    uint16 // 包序号
	Reversed  uint16 // 保留

	pbMessage proto.Message // 消息体
}

Reference materials:

Guess you like

Origin blog.csdn.net/xmcy001122/article/details/111215358