1. Why is there stick package ??
Let us first make a tcp-based remote command execution program (1: command execution error 2: execute ls 3: execute ifconfig)
Caution Caution:
res=subprocess.Popen(cmd.decode('utf-8'),
shell=True,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
The coding system is based on the results of the current location of the subject, if it is windows, then res.stdout.read () read GBK is encoded at the receiving end need to use GBK decode
Transmitting end may be a K transmission data a K, the receiving side application can two K two K to withdraw data, of course, also possible to first withdraw 3K or 6K data, or only once withdraw a few bytes of data, in other words, the application sees the data as a whole, or that is a stream (stream), a message how many bytes the application is not visible, so the TCP protocol is a stream-oriented protocol, which is easy the reason stick package problem. While UDP is a message-oriented protocol, each segment is a UDP message, in the application must extract the data units of the message can not be extracted any one-byte data, and it is different from TCP. How to define news? That the other side may be disposable write / send data as a message, to be understood that when the other send a message, no matter what the underlying tile segment, TCP protocol layer will sort the data segments constituting the entire message after completion presented in the kernel buffer.
For example tcp socket-based client to upload files to the server, send the file content is in accordance with a section of the stream of bytes sent, I looked at the receiver, did not know of bytes of the file stream where to start, where to stop
The so-called stick package problems mainly because the receiver does not know the limits between the message, does not know how many bytes of data caused by the time extraction.
Further, due to the sender stick package is itself caused by the TCP protocol, TCP to improve the transmission efficiency, the sender often collect enough data before sending a TCP segment. If successive send data are rarely needed, usually based on TCP optimization algorithms to the data transmission time a TCP segment out after the synthesis, so that the receiving side receives the data packet sticky.
- TCP (transport control protocol, the Transmission Control Protocol) is connection-oriented and stream-oriented, high reliability service. The receiver and transmitter (client and server) must have eleven pairs of socket, therefore, in order to transmit end a plurality of packets addressed to the receiving end, other more efficient to send, using the optimization method (the Nagle Algorithm), the times, at intervals smaller and smaller amount of data the data combined into a large block of data, then the packet. In this way, the receiving end, it is hard to tell the difference, and must provide scientific unpacking mechanism. I.e., non-oriented communication message stream is protected boundaries.
- UDP (user datagram protocol, User Datagram Protocol) is a connectionless, message-oriented, providing efficient service. The combined use of the optimization algorithm does not support block ,, since UDP is a many mode, the receiving end of the skbuff (socket buffer) using the chain structure to record the arrival of each UDP packet in each UDP We have a package (information source address and port) message header, so that, for the receiving side, it is easy to distinguish the process. That message is a message-oriented communication protected boundaries.
- tcp is based on the data stream, so messages sent and received can not be empty, which requires all add empty message handling mechanism in the client and server to prevent jamming program, which is based on udp datagram, even if you enter empty content (direct carriage return), it was not an empty message, udp protocols will help you package the message headers, a little experiment
udp The recvfrom is blocked, a recvfrom (x) must only sendinto (y), ended all x bytes of data even if completed, if y> x data is lost, which means that will not stick udp packet, but it will lose the data, unreliable
Tcp protocol data is not lost, did not receive complete package, the next reception, will continue to continue to receive the last, had always end will clear the buffer contents upon receipt ack. The data is reliable, but will stick package.
Stick package will occur in both cases.
The transmitting end only needs to send out the buffer is full and the like, resulting in stick package (data transmission time interval is short, the data is small, to join together to produce stick package)
#_*_coding:utf-8_*_ __author__ = 'Linhaifeng' from socket import * ip_port=('127.0.0.1',8080) tcp_socket_server=socket(AF_INET,SOCK_STREAM) tcp_socket_server.bind(ip_port) tcp_socket_server.listen(5) conn,addr=tcp_socket_server.accept() data1=conn.recv(10) data2=conn.recv(10) print('----->',data1.decode('utf-8')) print('----->',data2.decode('utf-8')) conn.close() 服务端
#_*_coding:utf-8_*_ __author__ = 'Linhaifeng' import socket BUFSIZE=1024 ip_port=('127.0.0.1',8080) s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) res=s.connect_ex(ip_port) s.send('hello'.encode('utf-8')) s.send('feng'.encode('utf-8')) 客户端
The recipient is not timely received packet buffers, resulting in multiple packet receive (a piece of data sent by the client, the server received only a small portion of the server the next time or take the time to close the last remaining data from the buffer generating stick package)
Coding _ * _ #: _ * _. 8 UTF- __author__ = 'Linhaifeng' from Import * Socket for ip_port = ( '127.0.0.1', 8080) tcp_socket_server = Socket (AF_INET, SOCK_STREAM) tcp_socket_server.bind (for ip_port) tcp_socket_server.listen (. 5) conn, addr = tcp_socket_server.accept () data1 = conn.recv (2) # does not receive a complete data2 = conn.recv (10) # close the next time, will first take the old data, and then take the new print ( ' -----> ', data1.decode (' UTF-. 8 ')) Print (' -----> ', data2.decode (' UTF-. 8 ')) conn.Close () server
#_*_coding:utf-8_*_ __author__ = 'Linhaifeng' import socket BUFSIZE=1024 ip_port=('127.0.0.1',8080) s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) res=s.connect_ex(ip_port) s.send('hello feng'.encode('utf-8')) 客户端
The occurrence of unpacking
When the length of the transmission side buffer is larger than the MTU of the network card, the data will be transmitted tcp split into several data packets sent.
A supplementary question: Why is the reliable transmission tcp, udp is unreliable transmission
Tcp-based data transmission, please refer to my other article http://www.cnblogs.com/linhaifeng/articles/5937962.html,tcp during data transfer, the sender sends the data to put its own cache, then the cache control protocol data sent to the peer, the peer returns an ack = 1, a transmitting side data buffer is clean, the peer returns ack = 0, the re-transmission data, so reliable tcp
Udp and transmitting data, the peer returns an acknowledgment message is not therefore unreliable
Supplementary two: send (byte stream) and the recv (1024) and sendall
recv means 1024 specified in a 1024-byte data out from the cache
The byte stream is to send into the hexyl side cache, then the cache is controlled by a protocol to send the content to the end, if the size of the byte stream to be transmitted is greater than the remaining buffer space, then the data is lost, it will cycle with sendall send calls, data is not lost
Root of the problem is that the receiver does not know the length of the byte stream of the sender to be transferred, so the solution stick package is around, how to get the sender before sending the data, byte stream, the total size of their own that will be sent to enable the receiver its end, and then fetched reception cycle of death has received all data
Running speed much faster than the transmission speed of the network, so before sending some bytes prior to the transmission of the byte stream length with send, this embodiment amplifies the performance cost of network latency
2. The method to solve the stick package
Custom stream of bytes plus a fixed length header, the header length included in the byte stream, and then send a pair to the end, remove the fixed length header at the time of start buffer receiving end, and then take the real data
struct module This module can be a type, such as numbers, conversion to a fixed length bytes >>> struct.pack ( 'I', 1,111,111,111,111) . . . . . . . . . struct.error: 'i' format requires -2147483648 <= number <= 2147483647 # This is the range Import JSON, struct # 1T assumed uploaded by the client: the file 1073741824000 a.txt # avoid stick package, must be customized header from header = { 'file_size': 1073741824000, 'file_name': '/ a / b / c / d / e / a.txt', 'md5': '8f6fbf8347faa4924a76856701edb0f3'} # 1T data, file path and md5 value # header for can transmit, and the need to serialize switch bytes head_bytes bytes = (json.dumps (header), encoding = 'UTF-. 8') # serialized and converted into bytes, for transmitting a # in order to allow the client to know the length of the header, struck by the length of the header to the digital-to-fixed length: 4 bytes head_len_bytes = struct.pack ( 'i', len (head_bytes)) # 4 bytes contains only one number which is the length of the header # client begins sending conn. # server starts to receive head_len_bytes = s.recv (4) # first collection header 4 bytes, the header length to obtain byte format x = struct.unpack ( 'i', head_len_bytes) [0 ] # extracted header length (bytes X format) header in accordance with the length of # x, charge = s.recv header head_bytes header = json.loads (json.dumps (header)) extracting the header # # Finally, the contents of the extracted real data header, such real_data_len = s.recv (header [ 'FILE_SIZE']) s.recv (real_data_len)
We can make the header dictionary, the dictionary contains details of real data to be transmitted, then json serialization, then struck with a data length after the sequence of bytes packed into four (4 enough to use your own)
Sent when:
First transmitter head length
Re-encoding the header and then transmits the content
Finally, the true content of hair
Receives:
Sente header length, taken out with struct
The content length header charged removed, then decoded, deserialized
Details of data to be taken out from the take deserialize the results and real data fetch content
# Custom stream of bytes plus a fixed length header, the header length included in the byte stream, and to send a peer, # peer removed fixed-length header buffer start is received, and then take the real data #struct module # the module can be of a type, such as numbers, converted into bytes of fixed length '' ' we can put a header made dictionary, the dictionary contains details real data to be transmitted, then json serialization, then struck the data length of the sequence of packed into four bytes (4 themselves enough to use) sending: first transmitter head length re-encoding the header content and then send the last sent real content receives: the upper hand header length, with struct taken out the length of the extracted content header charged, then decoded, deserialized details of access to data to be extracted from the result deserialized, then fetch the data content of the real '' ' # >>> struct.pack ( "I" , "ABC") # Traceback (MOST Recent Last Call): # File "<pyshell. 1 #>", Line. 1, in <Module1> # struct.pack ( "I", "ABC") # struct.error:IS AN intege argument not required # server (slightly more complex custom headers) Import socket, struct, json Import subprocess phone=socket.socket(socket.AF_INET,socket.SOCK_STREAM) ip_sort=("127.0.0.1",8080) back_log=5 phone.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1) phone.bind(ip_sort) phone.listen(back_log) while True: conn,addr=phone.accept() while True: cmd=conn.recv(1024) if not cmd:break print("cmd: %s" %cmd) res=subprocess.Popen(cmd.decode("utf-8"),shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE ,stdin=subprocess.PIPE) err=res.stderr.read() print(err) if err: back_msg=err the else: back_msg = res.stdout.read () headers = { 'Data_Size': len (back_msg)} head_json = json.dumps (headers) # serialized string Print (type (head_json)) head_json_bytes bytes = (head_json, encoding = "UTF-. 8") # struct.pack ( "I" type into converted packets, the second argument must be a number) conn.send (struct.pack ( "I", len (head_json_bytes))) # starting length of the header conn.send (head_json.encode ( "utf-8 ")) # re-transmitter head conn.sendall (back_msg) # recurrent real content conn.Close ()
# Method to solve the stick package of client Import socket, struct, json ip_port = ( " 127.0.0.1 " , 8080 ) tcp_client = socket.socket (socket.AF_INET, socket.SOCK_STREAM) tcp_client.connect_ex (ip_port) the while True: cmd = INPUT ( " >> " ) IF Not cmd: Continue tcp_client.send (cmd.encode ( ' UTF-. 8 ' )) Data = tcp_client.recv (. 4) # receiving the message header length NUM = struct.unpack ( " I " , data) [0]# The unpack unpack tuples out a Print (NUM) header = json.loads (tcp_client.recv (NUM) .decode ( " UTF-. 8 " )) # by receiving a header length of the header receiving data_len The header = [ " Data_Size " ] # length of the message gets sent recv_size = 0 recv_data = B '' the while recv_size < data_len the: recv_data + = tcp_client.recv (1024 ) recv_size = len (recv_data) Print (recv_data.decode ( " GBK " )) #print (recv_data.decode ( "GBK")) #windows default encoding is GBK