Detailed explanation of the sticky package problem of python socket network programming

1. Details of the sticky package problem

1. Only TCP has sticky packets, UDP will never stick to packets

   Your program actually has no right to operate the network card directly. You operate the network card through the interface exposed by the operating system to the user program. Then every time your program wants to send data to the remote, it actually copies the data from the user mode first. In the kernel mode, such an operation consumes resources and time. Frequent data exchange between the kernel mode and the user mode will inevitably reduce the transmission efficiency. Therefore, in order to improve the transmission efficiency of the socket, the sender often needs to collect enough data. Only send data to the other party once. If the data that needs to be sent several times in a row is very small, usually the TCP socket will combine the data into a TCP segment according to the optimization algorithm and send it out at one time, so that the receiver receives the sticky packet data.

2. First, you need to master the principle of a socket sending and receiving messages

    The sender can send data with 1k, 1k, and the application program on the receiver can extract data with 2k, 2k, of course, it is also possible
It is 3k or more to extract data, that is to say, the application is invisible, so the TCP protocol is the protocol for that stream,
This is also the reason why sticky packets are prone to occur. UDP is a connectionless protocol. Each UDP segment is a message, and the application must
Data must be extracted in units of messages, and one byte of data cannot be extracted at a time, which is very similar to TCP. how to define
What about news? Think that the data that the other party writes/sends at one time is a message, and what needs to be ordered is when the other party sends a message
At this time, no matter how Dingcheng is segmented, the TCP protocol layer will sort the data segments that constitute the entire message before presenting it in the kernel buffer.
 
    For example, a TCP-based socket client uploads a file to the server. When sending, the content of the file is sent according to a byte stream.
From the receiver's point of view, it is more stupid not to know where the byte stream of the file begins and ends.
 

3. Reasons for sticky bags

3-1 Direct Cause

  The so-called sticky packet problem is mainly because the receiver does not know the boundaries between messages and does not know how many bytes of data to extract at one time.

3-2 The root cause

  The sticky packet caused by the sender is caused by the TCP protocol itself. In order to improve the transmission efficiency of TCP, the sender often needs to collect enough data before sending a TCP segment. If the data that needs to be sent several times in a row is very small, usually TCP will synthesize the data into a TCP segment according to the optimization algorithm and send it out at one time, so that the receiver receives the sticky packet data.

3-3 Summary

  1. TCP (transport control protocol, transmission control protocol) is connection-oriented, stream-oriented, and provides high reliability services. There must be a pair of sockets at both ends of the transceiver (client and server). Therefore, the sender uses an optimization method (Nagle algorithm) in order to send multiple packets to the receiver more efficiently. Combine multiple data with small intervals and a small amount of data into a large data block, and then package it. In this way, the receiving end is difficult to distinguish, and a scientific unpacking mechanism must be provided. That is, flow-oriented communication has no message protection boundary.
  2. UDP (user datagram protocol, user datagram protocol) is connectionless, message-oriented, and provides efficient services. The block merge optimization algorithm will not be used. Since UDP supports a one-to-many mode, the skbuff (socket buffer) at the receiving end adopts a chain structure to record each arriving UDP packet. There is a message header (information source address, port, etc.) in the packet, so that it is easy for the receiving end to distinguish and process. That is, message-oriented communication has message protection boundaries.
  3. tcp is based on data streams, so the messages sent and received cannot be empty, which requires adding a processing mechanism for empty messages on both the client and the server to prevent the program from getting stuck, while udp is based on datagrams, even if you input It is empty content (enter directly), it is not an empty message, the udp protocol will encapsulate the message header for you, the experiment is omitted

  The recvfrom of udp is blocked. A recvfrom(x) must be sentinto(y) to the only one, and it is completed after receiving x bytes of data. If y>x data is lost, which means that udp will not stick packets at all. But it will lose data, unreliable

  The protocol data of tcp will not be lost, and the packet has not been received. The next time it is received, it will continue to receive the last time. The own end always clears the buffer content when it receives the ack. Data is reliable, but sticky packets.

Second, sticky packets will occur in two cases:

1. The sender needs to wait until the local buffer is full before sending out, resulting in sticky packets (the time interval for sending data is very short, and the data is very small. Python uses an optimization algorithm to combine them to generate sticky packets)

client

#_*_coding:utf-8_*_

import socket
BUFSIZE=1024
ip_port=('127.0.0.1',8080)

s = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
res=s.connect_ex(ip_port)


s.send('hello'.encode('utf-8'))
s.send('feng'.encode('utf-8'))

Server

#_*_coding:utf-8_*_

from socket import *
ip_port=('127.0.0.1',8080)

tcp_socket_server=socket(AF_INET,SOCK_STREAM)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)


conn,addr=tcp_socket_server.accept()


data1=conn.recv(10)
data2=conn.recv(10)

print('----->',data1.decode('utf-8'))
print('----->',data2.decode('utf-8'))

conn.close()

  

2. The receiving end does not accept the packets in the buffer in time, resulting in multiple packet acceptance (the client sends a piece of data, the server only receives a small part, and the server will take the last leftover from the buffer next time it receives it again. data, a sticky package is generated)

client

#_*_coding:utf-8_*_

import socket
BUFSIZE=1024
ip_port=('127.0.0.1',8080)

s = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
res=s.connect_ex(ip_port)


s.send('hello feng'.encode('utf-8'))

Server

#_*_coding:utf-8_*_

from socket import *
ip_port=('127.0.0.1',8080)

tcp_socket_server=socket(AF_INET,SOCK_STREAM)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)


conn,addr=tcp_socket_server.accept()


data1=conn.recv(2) #Not received at one time
data2=conn.recv(10)#When receiving the next time, the old data will be taken first, and then the new one will be taken

print('----->',data1.decode('utf-8'))
print('----->',data2.decode('utf-8'))

conn.close()

  

 Three, sticky package example:

Server:

Server
import socket
import subprocess
din = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.bind (ip_port)
din.listen (5)
conn,deer=din.accept()
data1=conn.recv(1024)
data2=conn.recv(1024)
print(data1)
print(data2)

Client:

client
import socket
import subprocess
din = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.connect(ip_port)
din.send('helloworld'.encode('utf-8'))
din.send('sb'.encode('utf-8'))

  

Fourth, the occurrence of unpacking

  When the length of the sender's buffer is greater than the MTU of the network card, tcp will split the data sent this time into several data packets and send it to the past

Supplementary question 1: Why is tcp reliable transmission and udp unreliable transmission?

  For tcp transmission, please refer to: http://www.cnblogs.com/wj-1314/p/8298025.html

  When tcp transmits data, the sender first sends the data to its own cache, and then the protocol control sends the data in the cache to the peer, the peer returns an ack=1, the sender clears the data in the cache, and the peer Return ack=0, then resend the data, so tcp is reliable

However, when udp sends data, the peer end will not return confirmation information, so it is not reliable.

Supplementary question 2: What do send (byte stream), recv(1024) and sendall mean?

  The 1024 specified in recv means to take 1024 bytes of data from the cache at a time

  The byte stream of send is first put into the own-end buffer, and then the content of the buffer is sent to the opposite end by the protocol control. If the size of the byte stream is larger than the remaining buffer space, the data will be lost, and sendall will be used to call send cyclically, and the data will not be lost.

Fifth, how to solve the problem of sticky package?

  The root of the problem is that the receiver does not know the length of the byte stream that the sender will transmit, so the solution to the sticky packet is how to let the sender let the receiver send the total size of the byte stream it will send before sending the data. The end knows, and then the receiving end receives all the data in an infinite loop.

5-1 Simple solution (solved from the surface):

  Add a time sleep under the client sending to avoid sticky packets. Time sleep should also be performed when the server receives, in order to effectively avoid sticky packets.

Client:

#client
import socket
import time
import subprocess
din = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.connect(ip_port)
din.send('helloworld'.encode('utf-8'))
time.sleep(3)
din.send('sb'.encode('utf-8'))

Server:

#Server
import socket
import time
import subprocess
din = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.bind (ip_port)
din.listen (5)
conn,deer=din.accept()
data1=conn.recv(1024)
time.sleep(4)
data2=conn.recv(1024)
print(data1)
print(data2)

  

The above solution will definitely have a lot of flaws, because you don't know when the transmission is over, and there will be problems with the length of the time pause. If it is long, it is inefficient, and if it is short, it is not suitable, so this method is not suitable.

5-2 Common solution (to see the problem from the root):

  The root of the problem is that the receiver does not know the length of the byte stream to be transmitted by the sender, so the solution to sticky packets is how to let the sender let the receiver send the total size of the byte stream to be sent before sending data. The end knows, and then the receiving end receives all the data in an infinite loop

  Add a custom fixed-length header to the byte stream, the header contains the length of the byte stream, and then send to the opposite end in turn. When the opposite end receives it, it first takes the fixed-length header from the cache, and then fetches the real data.

  Use the struct module to pack a fixed length of 4 bytes or 8 bytes. When the struct.pack.format parameter is "i", only numbers with a length of 10 can be packed, then you can also convert the length into json characters first. String, and then package.

 normal client

# _*_ coding: utf-8 _*_
import socket
import struct
phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
phone.connect(('127.0.0.1',8880)) #Connect server
while True:
    # send and receive messages
    cmd = input('Please enter the command >>:').strip()
    if not cmd:continue
    phone.send(cmd.encode('utf-8')) #Send
    # Receive header first
    header_struct = phone.recv(4) #Receive four
    unpack_res = struct.unpack('i',header_struct)
    total_size = unpack_res[0] #Total length
    #After receiving data
    recv_size = 0
    total_data=b''
    while recv_size<total_size: #The collection of the loop
        recv_data = phone.recv(1024) #1024 is just a maximum limit
        recv_size+=len(recv_data) #
        total_data+=recv_data #
    print('Returned message: %s'%total_data.decode('gbk'))
phone.close()

normal server

# _*_ coding: utf-8 _*_
import socket
import subprocess
import struct
phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM) # 手机
phone.bind(('127.0.0.1',8880)) #Bind mobile phone card
phone.listen(5) #The maximum number of blocks
print('start runing.....')
while True: #Link loop
    coon,addr = phone.accept()# waiting to answer the phone
    print(coon,addr)
    while True: #communication loop
        # send and receive messages
        cmd = coon.recv(1024) #Maximum number received
        print('Received: %s'%cmd.decode('utf-8'))
        # process
        res = subprocess.Popen(cmd.decode('utf-8'),shell = True,
                                          stdout=subprocess.PIPE, #standard output
                                          stderr=subprocess.PIPE #Standard error
                                )
        stdout = res.stdout.read()
        stderr = res.stderr.read()
        #First send the header (convert to a fixed-length bytes type, so how to convert it? The struct module is used)
        #len(stdout) + len(stderr)#Length of statistical data
        header = struct.pack('i',len(stdout)+len(stderr))#make header
        coon.send(header)
        #result of the command
        coon.send(stdout)
        coon.send(stderr)
    coon.close()
phone.close()

5-3 The solution to the optimized version (to solve the problem from the root)

  The optimized idea to solve the sticky packet problem is that the server optimizes the header information and uses a dictionary to describe the content to be sent. First, the dictionary cannot be directly transmitted over the network, and needs to be serialized into a json formatted string, and then converted into The bytes format is sent to the server. Because the length of the json string in bytes format is not fixed, the struct module is used to compress the length of the json string in bytes format into a fixed length and send it to the client. will get the complete packet.

Ultimate client

# _*_ coding: utf-8 _*_
import socket
import struct
import json
phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
phone.connect(('127.0.0.1',8080)) #Connect to the server
while True:
    # send and receive messages
    cmd = input('Please enter the command >>:').strip()
    if not cmd:continue
    phone.send(cmd.encode('utf-8')) #Send
    #First receive the length of the header
    header_len = struct.unpack('i',phone.recv(4))[0] #The inverse solution of bytes type
    # in the header
    header_bytes = phone.recv(header_len) #The received type is also bytes
    header_json = header_bytes.decode('utf-8') #Get the dictionary in json format
    header_dic = json.loads(header_json) #Deserialize to get the dictionary
    total_size = header_dic['total_size'] #Get the total length of the data
    #Final receive data
    recv_size = 0
    total_data=b''
    while recv_size<total_size: #The collection of the loop
        recv_data = phone.recv(1024) #1024 is just a maximum limit
        recv_size+=len(recv_data) #It is possible to receive not 1024 bytes, maybe more than 1024,
        # Then when receiving, the receiving is incomplete, so add the received length
        total_data+=recv_data #final result
    print('Returned message: %s'%total_data.decode('gbk'))
phone.close()

Ultimate server

# _*_ coding: utf-8 _*_
import socket
import subprocess
import struct
import json
phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM) # 手机
phone.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
phone.bind(('127.0.0.1',8080)) #Bind mobile phone card
phone.listen(5) #The maximum number of blocks
print('start runing.....')
while True: #Link loop
    coon,addr = phone.accept()# waiting to answer the phone
    print(coon,addr)
    while True: #communication loop
        # send and receive messages
        cmd = coon.recv(1024) #Maximum number received
        print('Received: %s'%cmd.decode('utf-8'))
        # process
        res = subprocess.Popen(cmd.decode('utf-8'),shell = True,
                                          stdout=subprocess.PIPE, #standard output
                                          stderr=subprocess.PIPE #Standard error
                                )
        stdout = res.stdout.read()
        stderr = res.stderr.read()
        # make header
        header_dic = {
            'total_size': len(stdout)+len(stderr), # total size
            'filename': None,
            'md5': None
        }
        header_json = json.dumps(header_dic) #string type
        header_bytes = header_json.encode('utf-8') #Convert to bytes type (but the length is variable)
        #Length of the first header
        coon.send(struct.pack('i',len(header_bytes))) #Send a fixed-length header
        # resend header
        coon.send(header_bytes)
        #The result of the last command
        coon.send(stdout)
        coon.send(stderr)
    coon.close()
phone.close()

Six, struct module

  People who know C language will definitely know the role of struct structure in C language. It defines a structure that contains different types of data (int, char, bool, etc.), which is convenient for processing a certain structure object. . In network communication, most of the transmitted data exists as binary data. When passing strings, you don't have to worry about too many problems, but when passing basic data such as int and char, you need a mechanism to pack some specific structure types into binary stream strings and then Network transmission, and the receiving end should also be able to unpack and restore the original structure data through some mechanism. The struct module in python provides such a mechanism. The main function of this module is to convert between python basic type values ​​and C struct types represented in python string format (This module performs conversions between Python values ​​and C structs represented as Python strings.). The stuct module provides a few very simple functions. Write a few examples below.

1. Basic pack and unpack

  struct provides packing and unpacking of data by format specifier. E.g:

#This module can convert a type, such as a number, into a fixed-length bytes type
import struct
# res = struct.pack('i',12345)
# print(res,len(res),type(res)) #length is 4

res2 = struct.pack('i',12345111)
print(res2,len(res2),type(res2)) #The length is also 4

unpack_res =struct.unpack('i',res2)
print(unpack_res)  #(12345111,)
# print(unpack_res[0]) #12345111

  In the code, a tuple data is first defined, including three data types of int, string, and float, and then a struct object is defined, and format'I3sf' is defined, where I means int, and 3s means a string of three characters in length. f represents float. Finally, pack and unpack through pack and unpack of struct. From the output results, it can be found that after the value is packed, it is converted into a binary byte string, and unpack can convert the byte string back to a tuple, but it is worth noting that the precision of float has changed, which is caused by Some objective factors, such as the operating system, are determined. The number of bytes occupied by the packed data is very similar to the struct in the C language.

2. To define the format, please refer to the comparison table provided by the official api:

3. Basic usage

import json,struct
#Assume the file a.txt of 1T:1073741824000 is uploaded through the client

#To avoid sticky packets, you must customize the header
header={'file_size':1073741824000,'file_name':'/a/b/c/d/e/a.txt','md5':'8f6fbf8347faa4924a76856701edb0f3'} #1T data, file path and md5 value

#In order for this header to be transmitted, it needs to be serialized and converted to bytes
head_bytes=bytes(json.dumps(header), encoding='utf-8') #Serialize and convert to bytes for transmission

#In order to let the client know the length of the header, use struck to convert the number of the header length to a fixed length: 4 bytes
head_len_bytes=struct.pack('i',len(head_bytes)) #These 4 bytes contain only one number, which is the length of the header

#Client starts sending
conn.send(head_len_bytes) #The length of the first header, 4 bytes
conn.send(head_bytes) #The byte format of the resend header
conn.sendall(file content) #Then send the byte format of the real content

#Server start receiving
head_len_bytes=s.recv(4) #Receive the header 4 bytes first, and get the byte format of the header length
x=struct.unpack('i',head_len_bytes)[0] #Extract the length of the header

head_bytes=s.recv(x) #According to the header length x, the bytes format of the header is charged
header=json.loads(json.dumps(header)) #Extract header

#Finally extract the real data according to the content of the header, such as
real_data_len=s.recv(header['file_size'])
s.recv(real_data_len)

  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325035964&siteId=291194637