1. Details of the sticky package problem
1. Only TCP has sticky packets, UDP will never stick to packets
Your program actually has no right to operate the network card directly. You operate the network card through the interface exposed by the operating system to the user program. Then every time your program wants to send data to the remote, it actually copies the data from the user mode first. In the kernel mode, such an operation consumes resources and time. Frequent data exchange between the kernel mode and the user mode will inevitably reduce the transmission efficiency. Therefore, in order to improve the transmission efficiency of the socket, the sender often needs to collect enough data. Only send data to the other party once. If the data that needs to be sent several times in a row is very small, usually the TCP socket will combine the data into a TCP segment according to the optimization algorithm and send it out at one time, so that the receiver receives the sticky packet data.
2. First, you need to master the principle of a socket sending and receiving messages
The sender can send data with 1k, 1k, and the application program on the receiver can extract data with 2k, 2k, of course, it is also possible It is 3k or more to extract data, that is to say, the application is invisible, so the TCP protocol is the protocol for that stream, This is also the reason why sticky packets are prone to occur. UDP is a connectionless protocol. Each UDP segment is a message, and the application must Data must be extracted in units of messages, and one byte of data cannot be extracted at a time, which is very similar to TCP. how to define What about news? Think that the data that the other party writes/sends at one time is a message, and what needs to be ordered is when the other party sends a message At this time, no matter how Dingcheng is segmented, the TCP protocol layer will sort the data segments that constitute the entire message before presenting it in the kernel buffer. For example, a TCP-based socket client uploads a file to the server. When sending, the content of the file is sent according to a byte stream. From the receiver's point of view, it is more stupid not to know where the byte stream of the file begins and ends.
3. Reasons for sticky bags
3-1 Direct Cause
The so-called sticky packet problem is mainly because the receiver does not know the boundaries between messages and does not know how many bytes of data to extract at one time.
3-2 The root cause
The sticky packet caused by the sender is caused by the TCP protocol itself. In order to improve the transmission efficiency of TCP, the sender often needs to collect enough data before sending a TCP segment. If the data that needs to be sent several times in a row is very small, usually TCP will synthesize the data into a TCP segment according to the optimization algorithm and send it out at one time, so that the receiver receives the sticky packet data.
3-3 Summary
- TCP (transport control protocol, transmission control protocol) is connection-oriented, stream-oriented, and provides high reliability services. There must be a pair of sockets at both ends of the transceiver (client and server). Therefore, the sender uses an optimization method (Nagle algorithm) in order to send multiple packets to the receiver more efficiently. Combine multiple data with small intervals and a small amount of data into a large data block, and then package it. In this way, the receiving end is difficult to distinguish, and a scientific unpacking mechanism must be provided. That is, flow-oriented communication has no message protection boundary.
- UDP (user datagram protocol, user datagram protocol) is connectionless, message-oriented, and provides efficient services. The block merge optimization algorithm will not be used. Since UDP supports a one-to-many mode, the skbuff (socket buffer) at the receiving end adopts a chain structure to record each arriving UDP packet. There is a message header (information source address, port, etc.) in the packet, so that it is easy for the receiving end to distinguish and process. That is, message-oriented communication has message protection boundaries.
- tcp is based on data streams, so the messages sent and received cannot be empty, which requires adding a processing mechanism for empty messages on both the client and the server to prevent the program from getting stuck, while udp is based on datagrams, even if you input It is empty content (enter directly), it is not an empty message, the udp protocol will encapsulate the message header for you, the experiment is omitted
The recvfrom of udp is blocked. A recvfrom(x) must be sentinto(y) to the only one, and it is completed after receiving x bytes of data. If y>x data is lost, which means that udp will not stick packets at all. But it will lose data, unreliable
The protocol data of tcp will not be lost, and the packet has not been received. The next time it is received, it will continue to receive the last time. The own end always clears the buffer content when it receives the ack. Data is reliable, but sticky packets.
Second, sticky packets will occur in two cases:
1. The sender needs to wait until the local buffer is full before sending out, resulting in sticky packets (the time interval for sending data is very short, and the data is very small. Python uses an optimization algorithm to combine them to generate sticky packets)
client
#_*_coding:utf-8_*_ import socket BUFSIZE=1024 ip_port=('127.0.0.1',8080) s = socket.socket (socket.AF_INET, socket.SOCK_STREAM) res=s.connect_ex(ip_port) s.send('hello'.encode('utf-8')) s.send('feng'.encode('utf-8'))
Server
#_*_coding:utf-8_*_ from socket import * ip_port=('127.0.0.1',8080) tcp_socket_server=socket(AF_INET,SOCK_STREAM) tcp_socket_server.bind(ip_port) tcp_socket_server.listen(5) conn,addr=tcp_socket_server.accept() data1=conn.recv(10) data2=conn.recv(10) print('----->',data1.decode('utf-8')) print('----->',data2.decode('utf-8')) conn.close()
2. The receiving end does not accept the packets in the buffer in time, resulting in multiple packet acceptance (the client sends a piece of data, the server only receives a small part, and the server will take the last leftover from the buffer next time it receives it again. data, a sticky package is generated)
client
#_*_coding:utf-8_*_ import socket BUFSIZE=1024 ip_port=('127.0.0.1',8080) s = socket.socket (socket.AF_INET, socket.SOCK_STREAM) res=s.connect_ex(ip_port) s.send('hello feng'.encode('utf-8'))
Server
#_*_coding:utf-8_*_ from socket import * ip_port=('127.0.0.1',8080) tcp_socket_server=socket(AF_INET,SOCK_STREAM) tcp_socket_server.bind(ip_port) tcp_socket_server.listen(5) conn,addr=tcp_socket_server.accept() data1=conn.recv(2) #Not received at one time data2=conn.recv(10)#When receiving the next time, the old data will be taken first, and then the new one will be taken print('----->',data1.decode('utf-8')) print('----->',data2.decode('utf-8')) conn.close()
Three, sticky package example:
Server:
Server import socket import subprocess din = socket.socket (socket.AF_INET, socket.SOCK_STREAM) ip_port=('127.0.0.1',8080) din.bind (ip_port) din.listen (5) conn,deer=din.accept() data1=conn.recv(1024) data2=conn.recv(1024) print(data1) print(data2)
Client:
client import socket import subprocess din = socket.socket (socket.AF_INET, socket.SOCK_STREAM) ip_port=('127.0.0.1',8080) din.connect(ip_port) din.send('helloworld'.encode('utf-8')) din.send('sb'.encode('utf-8'))
Fourth, the occurrence of unpacking
When the length of the sender's buffer is greater than the MTU of the network card, tcp will split the data sent this time into several data packets and send it to the past
Supplementary question 1: Why is tcp reliable transmission and udp unreliable transmission?
For tcp transmission, please refer to: http://www.cnblogs.com/wj-1314/p/8298025.html
When tcp transmits data, the sender first sends the data to its own cache, and then the protocol control sends the data in the cache to the peer, the peer returns an ack=1, the sender clears the data in the cache, and the peer Return ack=0, then resend the data, so tcp is reliable
However, when udp sends data, the peer end will not return confirmation information, so it is not reliable.
Supplementary question 2: What do send (byte stream), recv(1024) and sendall mean?
The 1024 specified in recv means to take 1024 bytes of data from the cache at a time
The byte stream of send is first put into the own-end buffer, and then the content of the buffer is sent to the opposite end by the protocol control. If the size of the byte stream is larger than the remaining buffer space, the data will be lost, and sendall will be used to call send cyclically, and the data will not be lost.
Fifth, how to solve the problem of sticky package?
The root of the problem is that the receiver does not know the length of the byte stream that the sender will transmit, so the solution to the sticky packet is how to let the sender let the receiver send the total size of the byte stream it will send before sending the data. The end knows, and then the receiving end receives all the data in an infinite loop.
5-1 Simple solution (solved from the surface):
Add a time sleep under the client sending to avoid sticky packets. Time sleep should also be performed when the server receives, in order to effectively avoid sticky packets.
Client:
#client import socket import time import subprocess din = socket.socket (socket.AF_INET, socket.SOCK_STREAM) ip_port=('127.0.0.1',8080) din.connect(ip_port) din.send('helloworld'.encode('utf-8')) time.sleep(3) din.send('sb'.encode('utf-8'))
Server:
#Server import socket import time import subprocess din = socket.socket (socket.AF_INET, socket.SOCK_STREAM) ip_port=('127.0.0.1',8080) din.bind (ip_port) din.listen (5) conn,deer=din.accept() data1=conn.recv(1024) time.sleep(4) data2=conn.recv(1024) print(data1) print(data2)
The above solution will definitely have a lot of flaws, because you don't know when the transmission is over, and there will be problems with the length of the time pause. If it is long, it is inefficient, and if it is short, it is not suitable, so this method is not suitable.
5-2 Common solution (to see the problem from the root):
The root of the problem is that the receiver does not know the length of the byte stream to be transmitted by the sender, so the solution to sticky packets is how to let the sender let the receiver send the total size of the byte stream to be sent before sending data. The end knows, and then the receiving end receives all the data in an infinite loop
Add a custom fixed-length header to the byte stream, the header contains the length of the byte stream, and then send to the opposite end in turn. When the opposite end receives it, it first takes the fixed-length header from the cache, and then fetches the real data.
Use the struct module to pack a fixed length of 4 bytes or 8 bytes. When the struct.pack.format parameter is "i", only numbers with a length of 10 can be packed, then you can also convert the length into json characters first. String, and then package.
normal client
# _*_ coding: utf-8 _*_ import socket import struct phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM) phone.connect(('127.0.0.1',8880)) #Connect server while True: # send and receive messages cmd = input('Please enter the command >>:').strip() if not cmd:continue phone.send(cmd.encode('utf-8')) #Send # Receive header first header_struct = phone.recv(4) #Receive four unpack_res = struct.unpack('i',header_struct) total_size = unpack_res[0] #Total length #After receiving data recv_size = 0 total_data=b'' while recv_size<total_size: #The collection of the loop recv_data = phone.recv(1024) #1024 is just a maximum limit recv_size+=len(recv_data) # total_data+=recv_data # print('Returned message: %s'%total_data.decode('gbk')) phone.close()
normal server
# _*_ coding: utf-8 _*_ import socket import subprocess import struct phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM) # 手机 phone.bind(('127.0.0.1',8880)) #Bind mobile phone card phone.listen(5) #The maximum number of blocks print('start runing.....') while True: #Link loop coon,addr = phone.accept()# waiting to answer the phone print(coon,addr) while True: #communication loop # send and receive messages cmd = coon.recv(1024) #Maximum number received print('Received: %s'%cmd.decode('utf-8')) # process res = subprocess.Popen(cmd.decode('utf-8'),shell = True, stdout=subprocess.PIPE, #standard output stderr=subprocess.PIPE #Standard error ) stdout = res.stdout.read() stderr = res.stderr.read() #First send the header (convert to a fixed-length bytes type, so how to convert it? The struct module is used) #len(stdout) + len(stderr)#Length of statistical data header = struct.pack('i',len(stdout)+len(stderr))#make header coon.send(header) #result of the command coon.send(stdout) coon.send(stderr) coon.close() phone.close()
5-3 The solution to the optimized version (to solve the problem from the root)
The optimized idea to solve the sticky packet problem is that the server optimizes the header information and uses a dictionary to describe the content to be sent. First, the dictionary cannot be directly transmitted over the network, and needs to be serialized into a json formatted string, and then converted into The bytes format is sent to the server. Because the length of the json string in bytes format is not fixed, the struct module is used to compress the length of the json string in bytes format into a fixed length and send it to the client. will get the complete packet.
Ultimate client
# _*_ coding: utf-8 _*_ import socket import struct import json phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM) phone.connect(('127.0.0.1',8080)) #Connect to the server while True: # send and receive messages cmd = input('Please enter the command >>:').strip() if not cmd:continue phone.send(cmd.encode('utf-8')) #Send #First receive the length of the header header_len = struct.unpack('i',phone.recv(4))[0] #The inverse solution of bytes type # in the header header_bytes = phone.recv(header_len) #The received type is also bytes header_json = header_bytes.decode('utf-8') #Get the dictionary in json format header_dic = json.loads(header_json) #Deserialize to get the dictionary total_size = header_dic['total_size'] #Get the total length of the data #Final receive data recv_size = 0 total_data=b'' while recv_size<total_size: #The collection of the loop recv_data = phone.recv(1024) #1024 is just a maximum limit recv_size+=len(recv_data) #It is possible to receive not 1024 bytes, maybe more than 1024, # Then when receiving, the receiving is incomplete, so add the received length total_data+=recv_data #final result print('Returned message: %s'%total_data.decode('gbk')) phone.close()
Ultimate server
# _*_ coding: utf-8 _*_ import socket import subprocess import struct import json phone = socket.socket (socket.AF_INET, socket.SOCK_STREAM) # 手机 phone.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) phone.bind(('127.0.0.1',8080)) #Bind mobile phone card phone.listen(5) #The maximum number of blocks print('start runing.....') while True: #Link loop coon,addr = phone.accept()# waiting to answer the phone print(coon,addr) while True: #communication loop # send and receive messages cmd = coon.recv(1024) #Maximum number received print('Received: %s'%cmd.decode('utf-8')) # process res = subprocess.Popen(cmd.decode('utf-8'),shell = True, stdout=subprocess.PIPE, #standard output stderr=subprocess.PIPE #Standard error ) stdout = res.stdout.read() stderr = res.stderr.read() # make header header_dic = { 'total_size': len(stdout)+len(stderr), # total size 'filename': None, 'md5': None } header_json = json.dumps(header_dic) #string type header_bytes = header_json.encode('utf-8') #Convert to bytes type (but the length is variable) #Length of the first header coon.send(struct.pack('i',len(header_bytes))) #Send a fixed-length header # resend header coon.send(header_bytes) #The result of the last command coon.send(stdout) coon.send(stderr) coon.close() phone.close()
Six, struct module
People who know C language will definitely know the role of struct structure in C language. It defines a structure that contains different types of data (int, char, bool, etc.), which is convenient for processing a certain structure object. . In network communication, most of the transmitted data exists as binary data. When passing strings, you don't have to worry about too many problems, but when passing basic data such as int and char, you need a mechanism to pack some specific structure types into binary stream strings and then Network transmission, and the receiving end should also be able to unpack and restore the original structure data through some mechanism. The struct module in python provides such a mechanism. The main function of this module is to convert between python basic type values and C struct types represented in python string format (This module performs conversions between Python values and C structs represented as Python strings.). The stuct module provides a few very simple functions. Write a few examples below.
1. Basic pack and unpack
struct provides packing and unpacking of data by format specifier. E.g:
#This module can convert a type, such as a number, into a fixed-length bytes type import struct # res = struct.pack('i',12345) # print(res,len(res),type(res)) #length is 4 res2 = struct.pack('i',12345111) print(res2,len(res2),type(res2)) #The length is also 4 unpack_res =struct.unpack('i',res2) print(unpack_res) #(12345111,) # print(unpack_res[0]) #12345111
In the code, a tuple data is first defined, including three data types of int, string, and float, and then a struct object is defined, and format'I3sf' is defined, where I means int, and 3s means a string of three characters in length. f represents float. Finally, pack and unpack through pack and unpack of struct. From the output results, it can be found that after the value is packed, it is converted into a binary byte string, and unpack can convert the byte string back to a tuple, but it is worth noting that the precision of float has changed, which is caused by Some objective factors, such as the operating system, are determined. The number of bytes occupied by the packed data is very similar to the struct in the C language.
2. To define the format, please refer to the comparison table provided by the official api:
3. Basic usage
import json,struct #Assume the file a.txt of 1T:1073741824000 is uploaded through the client #To avoid sticky packets, you must customize the header header={'file_size':1073741824000,'file_name':'/a/b/c/d/e/a.txt','md5':'8f6fbf8347faa4924a76856701edb0f3'} #1T data, file path and md5 value #In order for this header to be transmitted, it needs to be serialized and converted to bytes head_bytes=bytes(json.dumps(header), encoding='utf-8') #Serialize and convert to bytes for transmission #In order to let the client know the length of the header, use struck to convert the number of the header length to a fixed length: 4 bytes head_len_bytes=struct.pack('i',len(head_bytes)) #These 4 bytes contain only one number, which is the length of the header #Client starts sending conn.send(head_len_bytes) #The length of the first header, 4 bytes conn.send(head_bytes) #The byte format of the resend header conn.sendall(file content) #Then send the byte format of the real content #Server start receiving head_len_bytes=s.recv(4) #Receive the header 4 bytes first, and get the byte format of the header length x=struct.unpack('i',head_len_bytes)[0] #Extract the length of the header head_bytes=s.recv(x) #According to the header length x, the bytes format of the header is charged header=json.loads(json.dumps(header)) #Extract header #Finally extract the real data according to the content of the header, such as real_data_len=s.recv(header['file_size']) s.recv(real_data_len)