How TCP protocol to solve the stick package, half a pack a problem reprint: https: //mp.weixin.qq.com/s/XqGCaX94hCvrYI_Tvfq_yQ

TCP protocol is a streaming protocol

Since many readers access to the Internet knowledge, you should've heard the phrase: TCP protocol is a streaming protocol. So this sentence in the end what does it mean? The so-called streaming protocol, that the agreement is like water in a stream of bytes, there is no clear demarcation between the content and the content sign, we need people to go to these agreements demarcation.

For example, A with B TCP communication, A has sent a 100-byte and 200 byte packets to B, then B is how to receive it? B 100 may receive the first byte, 200 byte receive; may also receive the first 50 bytes, 250 bytes of receive; or to receive 100 bytes, 100 bytes of receive, to receive 200 bytes; 20 or first received byte, 20-byte receive, and then receives 60 bytes, 100 bytes of receive, receive 50 bytes, 50 bytes receive ......

Readers do not know the law is not seen? A rule is a total of 300 bytes transmitted to B, B to or more times may be in any form of a total number of 300 bytes received. Assuming 100 bytes and 200 bytes, respectively, A to B is transmitted a data packet, for sending end A, this can be distinguished, but for B, if the length is not artificial and a plurality of data packets, each B should not know how many bytes of the received data as a valid packet. And each time the regulations how much data as a protocol format specification is one of the elements of the package.

There is often a novice write code like the following:

The sender:

1//...省略创建socket,建立连接等部分不相关的逻辑...
2char buf[] = "the quick brown fox jumps over a lazy dog.";
3int n = send(socket, buf, strlen(buf), 0);
4//...省略出错处理逻辑...

Receiving end:

1//省略创建socket,建立连接等部分不相关的逻辑...
2char recvBuf[50] = { 0 };
3int n = recv(socket, recvBuf, 50, 0);
4//省略出错处理逻辑...
5printf("recvBuf: %s", recvBuf);

In order to focus the discussion of the issue itself, I omitted here and some logic to establish a connection error handling. The code transmitting end to the receiving end of the transmitted character string "the quick brown fox jumps over a lazy dog.", After the receiver receives print it.

This code is generally similar to the good work of this machine, print out the receiving end as scheduled expected string, but into the local area network or a public network environment on the problem, namely the receiving end may not be comprehensive print out the string ; If the sender continuously transmits a plurality of times the string, the reception side of the printed character string is incomplete or garbled. Incomplete reasons well understood, i.e. the end of a particular received data smaller than the length of the full string, the array begins to be emptied into recvbuf 0, after receiving the partial character string, the end of the string is still 0, the printf function Looking at zero character marks the end of the end of the output; garbled reason is that if a particular income data includes not only a complete string, the string also contains the next part, then recvBuf array will be filled, printf function output when will still be looking at zero character marks the end of the end of the output, so the memory is read on cross-border, and has been found so far, and the memory after the cross-border may be unreadable characters are displayed after the hash.

I give this example I hope you understand that to have an intuitive understanding of the TCP protocol is a streaming protocol. Because of this, we need to artificially transmitting and receiving ends at every predetermined byte stream boundary, so that the receiving end knows what the number of bytes taken out from a position as to parse the data packet, which is a network communication protocol we designed one of the formats of work to do.

How to solve two problems stick package

The actual development of network communication program or technical interview, the interviewer usually more of a problem will ask is: when network traffic, how to solve the stick package?

Some interviewers may'm asking: when network traffic, how to solve the stick package, packet loss or packet reordering problem? In fact, this question is the basis of knowledge in the study interview the interviewer's network, if the protocol is TCP, in most scenarios, is there is no packet loss and packet reordering problem, TCP communication is reliable communication, TCP protocol stack through a sequence number and packet retransmission mechanisms to ensure an orderly confirmation packet and must be correctly sent to the destination; if the UDP protocol, if a small amount of packet loss can not be accepted, it would have to achieve their own on the basis of the UDP is similar to TCP this order and a reliable transport mechanism (e.g. RTP protocol, RUDP protocol). So, after the dismantling of the problem, how to solve the problem only stick package.

First explain what is the stick package , the so-called stick package is sent continuously to two or more data packets to the peer, a peer in charge may receive packets larger than 1, greater than 1, may be several (including a) package plus part of a package, or simply a few complete package together. Of course, data may only receive a portion of the package, this is generally also called half pack .

Whether half a pack or stick package problems, its root cause is the TCP protocol described in the above format is streaming data. Ideas or think of ways to solve the problem from the received data packet and package of the border to distinguish. So how do you distinguish it? There are three main methods:

Fixed packet length packets

As the name suggests, i.e. the length of each protocol packet is fixed. For example, we can, for example, a predetermined size of each protocol packet is 64 bytes, 64 bytes each received full, taken out to resolve (if not, it would first deposit).

Simple communication protocol format, but such flexibility is poor. If the number of bytes of the packet contents less than specified, the remaining space filled with special information needs, such as \ 0 (if not filled with special content, how to distinguish normal content inside the package is filled with the information it?); If the package content exceeds the specified bytes number, packet fragmentation scored again, the need for additional processing logic - subcontract fragment on the transmitting side, the receiving end reassembled packet-slots (sub-content and the fragment will be detailed in the following) in.

To specify the character (string) to mark the end of the package

This more common protocol packet, i.e., the byte stream is considered to end when the packet encounters a special symbol value. For example, we are familiar with FTP protocol, an SMTP mail protocol, a command or a piece of data followed by "\ r \ n" (the so-called  CRLF ) represents a package completed. After the end of the received data before each encounter a "\ r \ n" as a put packet.

This protocol is typically used for some applications include various commands controlled, its shortcomings is that if the content of the protocol data packet requires the packet end flag portion character, it is necessary to do the transcoding of these characters or escape operation, to avoid being received Fang mistaken for end of packet flag and error resolution.

Packet header + body format

This packet format is generally divided into two parts, namely a header and a body, a fixed-size header, and must contain a packet header field to illustrate how the next packet body.

E.g:

1struct msg_header
2{
3  int32_t bodySize;
4  int32_t cmd;
5};

This is a typical header format, bodySize specifies the inclusion of this package is how much. Since the header size is fixed (this is the size (int32_t) + sizeof (int32_t) = 8 bytes), the number of bytes of the end of the first charge header size (of course, if not cached or the first, up until close enough), then packet header parsing, according to collect inclusion bodies packet header specified size, and other inclusions close enough, it is assembled into a package to complete the processing. In some implementations, the header of bodySize may be replaced by another called packageSize the field, the meaning of this field is the size of the entire package, this time, we just use packageSize minus the header size (here sizeof (msg_header)) will be able to calculate the size of the package body, the principle above.

With most network library, you usually need to own the data packet boundaries and parsing, general network library does not provide this function according to the protocol format it is out of the need to support different protocols, due to the uncertainty of the agreement, and therefore can not DETAILED unpacking the code provided in advance. Of course, this is not absolute, there are some network libraries provide this functionality. In Java Netty network framework provided FixedLengthFrameDecoder class to handle length is fixed-length protocol packet is provided DelimiterBasedFrameDecoder class to handle protocol packets according to the special character as a terminator, provided ByteToMessageDecoder to process custom format protocol packets (used to processing the packet header + body format of the packet), but in succession ByteToMessageDecoder subclass you need to override decode based on your specific protocol format () method to unpack the data packet.

These three package format, readers can expect in-depth understanding and grasp the basic principles of its advantages and disadvantages.

Three unpacking processing

After understanding the three formats of data packets described earlier, we have to explain how these three technologies for packet formats should be handled. Which process flow is the same, here we header + inclusion  packets of this format will be described. Process is as follows:

We assume that the header format is as follows:

1//强制一字节对齐
2#pragma pack(push, 1)
3//协议头
4struct msg
5{   
6    int32_t  bodysize;         //包体大小  
7};
8#pragma pack(pop)

Then the above procedure codes are as follows:

 1//包最大字节数限制为10M
2#define MAX_PACKAGE_SIZE    10 * 1024 * 1024
3
4void ChatSession::OnRead(const std::shared_ptr<TcpConnection>& conn, Buffer* pBuffer, Timestamp receivTime)
5{
6    while (true)
7    {
8        //不够一个包头大小
9        if (pBuffer->readableBytes() < (size_t)sizeof(msg))
10        {
11            //LOGI << "buffer is not enough for a package header, pBuffer->readableBytes()=" << pBuffer->readableBytes() << ", sizeof(msg)=" << sizeof(msg);
12            return;
13        }
14
15        //取包头信息
16        msg header;
17        memcpy(&header, pBuffer->peek(), sizeof(msg));
18
19        //包头有错误,立即关闭连接
20        if (header.bodysize <= 0 || header.bodysize > MAX_PACKAGE_SIZE)
21        {
22            //客户端发非法数据包,服务器主动关闭之
23            LOGE("Illegal package, bodysize: %lld, close TcpConnection, client: %s", header.bodysize, conn->peerAddress().toIpPort().c_str());
24            conn->forceClose();
25            return;
26        }
27
28        //收到的数据不够一个完整的包
29        if (pBuffer->readableBytes() < (size_t)header.bodysize + sizeof(msg))
30            return;
31
32        pBuffer->retrieve(sizeof(msg));
33        //inbuf用来存放当前要处理的包
34        std::string inbuf;
35        inbuf.append(pBuffer->peek(), header.bodysize);
36        pBuffer->retrieve(header.bodysize);          
37        //解包和业务处理
38        if (!Process(conn, inbuf.c_str(), inbuf.length()))
39        {
40            //客户端发非法数据包,服务器主动关闭之
41            LOGE("Process package error, close TcpConnection, client: %s", conn->peerAddress().toIpPort().c_str());
42            conn->forceClose();
43            return;
44        }              
45    }// end while-loop
46}

And the above-described processing flowchart showing the flow of the code is the same, a receive buffer where pBuffer custom code here, the data has been received into this buffer, so the bytes that have been received is determined the method requires the use of only the number corresponding to the object. The code I need to emphasize some details:

  • When taking Baotou, you should copy the data size of a packet header out, instead of taking the data directly from the buffer pBuffer out (ie taken out of the data is removed from the pBuffer), because if the next packet's header when the field to get the package body size, if remaining data is not a package body size, but you have to put this header back into the data buffer. In order to avoid this unnecessary operation, only enough buffer size of the entire data packet size (code: header.bodysize + sizeof (msg)) you only need to remove the entire size of the data packet from the buffer, which is herein the pBuffer-> meaning peek () method peek word (Chinese can be translated as "glance" or "peeping").

  • When the obtained packet header by the size of the body, you must verify the numerical bodysize I bodysize required here must be greater than 0 and not greater than 1024 * 1024 * 10 (i.e. 10 M). Of course, the actual development, you may want to decide on the limit bodysize (inclusion size is 0 byte packets in some business scenarios are permitted) according to your own needs. Remember, this must be determined upper and lower limits, because it is assumed that an illegal data sent from the client, which is provided bodysize a relatively large value, for example, 1 * 1024 * 1024 * 1024 (i.e., G 1), you logic will let you keep a cache of the data sent by the client, then your server memory will soon be exhausted, the operating system when it detects your process memory reaches a certain threshold will kill your process, resulting in service can no longer normal external services. If you detect a bodysize field meets your upper and lower limits set for the illegal bodysize, directly off this road connection. It is also a self-protection services, avoid the loss caused by illegal packets.

  • I do not know if you have noticed the entire judgment Baotou, inclusion and packet processing logic in a while loop inside, it is necessary. Without this while loop, when you receive more than a one-time package, you will only deal with a next process would then need to wait until a new batch of data comes trigger this logic again. The results of such cause is that the peer sent you multiple requests, you can only answer one of the peer data is sent again to the back of your answer have to wait until. This is to stick package correct processing logic.

And the code above is the most basic sticky and semi-packet packet processing mechanism , the so-called technical solutions of the packet processing logic (processing logic unpacks the business later chapters reintroduced). I hope readers can understand them, understand their basis, we can expand to unpack a lot of functions, for example, we give our agreement package adds support for a compression function, we become the following header like this:

 1#pragma pack(push, 1)
2//协议头
3struct msg
4{
5    char     compressflag;     //压缩标志,如果为1,则启用压缩,反之不启用压缩
6    int32_t  originsize;       //包体压缩前大小
7    int32_t  compresssize;     //包体压缩后大小
8    char     reserved[16];       //保留字段,用于将来拓展
9};
10#pragma pack(pop)

The modified code as follows:

 1void ChatSession::OnRead(const std::shared_ptr<TcpConnection>& conn, Buffer* pBuffer, Timestamp receivTime)
2{
3    while (true)
4    {
5        //不够一个包头大小
6        if (pBuffer->readableBytes() < (size_t)sizeof(msg))
7        {
8            //LOGI << "buffer is not enough for a package header, pBuffer->readableBytes()=" << pBuffer->readableBytes() << ", sizeof(msg)=" << sizeof(msg);
9            return;
10        }
11
12        //取包头信息
13        msg header;
14        memcpy(&header, pBuffer->peek(), sizeof(msg));
15
16        //数据包压缩过
17        if (header.compressflag == PACKAGE_COMPRESSED)
18        {
19            //包头有错误,立即关闭连接
20            if (header.compresssize <= 0 || header.compresssize > MAX_PACKAGE_SIZE ||
21                header.originsize <= 0 || header.originsize > MAX_PACKAGE_SIZE)
22            {
23                //客户端发非法数据包,服务器主动关闭之
24                LOGE("Illegal package, compresssize: %lld, originsize: %lld, close TcpConnection, client: %s",  header.compresssize, header.originsize, conn->peerAddress().toIpPort().c_str());
25                conn->forceClose();
26                return;
27            }
28
29            //收到的数据不够一个完整的包
30            if (pBuffer->readableBytes() < (size_t)header.compresssize + sizeof(msg))
31                return;
32
33            pBuffer->retrieve(sizeof(msg));
34            std::string inbuf;
35            inbuf.append(pBuffer->peek(), header.compresssize);
36            pBuffer->retrieve(header.compresssize);
37            std::string destbuf;
38            if (!ZlibUtil::UncompressBuf(inbuf, destbuf, header.originsize))
39            {
40                LOGE("uncompress error, client: %s", conn->peerAddress().toIpPort().c_str());
41                conn->forceClose();
42                return;
43            }
44
45            //业务逻辑处理
46            if (!Process(conn, destbuf.c_str(), destbuf.length()))
47            {
48                //客户端发非法数据包,服务器主动关闭之
49                LOGE("Process error, close TcpConnection, client: %s", conn->peerAddress().toIpPort().c_str());
50                conn->forceClose();
51                return;
52            }
53        }
54        //数据包未压缩
55        else
56        {
57            //包头有错误,立即关闭连接
58            if (header.originsize <= 0 || header.originsize > MAX_PACKAGE_SIZE)
59            {
60                //客户端发非法数据包,服务器主动关闭之
61                LOGE("Illegal package, compresssize: %lld, originsize: %lld, close TcpConnection, client: %s", header.compresssize, header.originsize, conn->peerAddress().toIpPort().c_str());
62                conn->forceClose();
63                return;
64            }
65
66            //收到的数据不够一个完整的包
67            if (pBuffer->readableBytes() < (size_t)header.originsize + sizeof(msg))
68                return;
69
70            pBuffer->retrieve(sizeof(msg));
71            std::string inbuf;
72            inbuf.append(pBuffer->peek(), header.originsize);
73            pBuffer->retrieve(header.originsize);
74            //业务逻辑处理
75            if (!Process(conn, inbuf.c_str(), inbuf.length()))
76            {
77                //客户端发非法数据包,服务器主动关闭之
78                LOGE("Process error, close TcpConnection, client: %s", conn->peerAddress().toIpPort().c_str());
79                conn->forceClose();
80                return;
81            }
82        }// end else
83
84    }// end while-loop
85}

The code first header field compression flag is determined whether the compressed bag body, if compressed, then decompressed to remove the package body size, data after decompression is the real data traffic. Throughout the program flow chart is as follows:

There is a code receive buffer variable pBuffer, the receive buffer on how to design, we will detail in a later article.

Guess you like

Origin www.cnblogs.com/testzcy/p/12536650.html