Advanced Python - Socket Programming

Table of contents

What is Socket programming

TCP Socket Programming

Application message format

Why define the message format

Example 1

Example 2

Support for multiple TCP clients

UDP socket programming

UDP Protocol Features

UDP socket programming


What is Socket programming

Today's software development basically needs it  网络通讯 .

Whether it is traditional computer software, mobile phone software, or IoT embedded system software, these must communicate with other network systems.

Today's network world basically uses the TCP/IP protocol for communication. Any application, such as browsing the web, WeChat, Alipay, Douyin, or those developed by us, communicates through the TCP/IP protocol .

The TCP/IP protocol is a scheme for transmitting data .

In software development, sending and receiving information  程序进程 is like   ;发件人 和 收件人

Sending and receiving  信息 is like express delivery  ; 物品

The transmission path of the specific information (which routers pass through in the middle) and the transmission method (what protocol is used) are like the transportation process of the express company;

Similarly, we do not need to know all the details of information transmission, such as which routers pass through and how the routers are transmitted, when we write programs that send information and programs that receive information.

As programmers, we only need to know how our program gives the information to be sent to the 'recipient' and how to obtain information from the 'sender'.

So who exactly are the 'To' and 'Sender' who are dealing directly with our application?

provided by the operating system socket 编程接口

The application program that sends information, through  the TCP/IP protocol stack communication module of the operating system; socket 编程接口

The communication module is passed to other communication modules (network card driver, etc.) layer by layer, and finally sent to the network through hardware devices such as network card;

After repeated forwarding by routers on the network, it finally reaches the computer (or mobile phone and other equipment) where the target program is located, and then uploads it layer by layer through the TCP/IP protocol stack communication module of its operating system.

Finally, the program that receives the information passes through  socket 编程接口 the received transmitted information.

This process can be represented by the following figure

You may have used the requests library to send HTTP request messages. In fact, the bottom layer of the requests library also uses the socket programming interface to send HTTP request messages.

The bottom layer of the message transmitted by HTTP is also transmitted through the TCP/IP protocol. HTTP adds some additional regulations, such as the format of the transmitted message.

TCP Socket Programming

For socket programming and sending network messages, we can use Python's built-in socket library.

The current socket programming is mostly used for network communication through the TCP protocol.

The two sides of the TCP communication program are divided into server and client.

The two parties communicating through the TCP protocol need to establish a virtual connection first. Then both programs can send business data information.

Establishing a TCP virtual connection is done through the well-known . 三次握手 

For the details of the three-way handshake, you can refer to this article  for an in-depth understanding of the tcp three-way handshake and four-way handshake? - Know almost

Let's now look at a socket server program and client program that communicate with the TCP protocol.

 

  1. Create a socket -- socket()
  2. Specify the local address──bind()
  3. Establish a socket connection - connect() and accept()
  4. Listening connection ── listen()
  5. Data transmission──send() and recv()
  6. Input/Output Multiplexing - select()
  7. Closing a socket ── closesocket()

The following is the TCP server program server.py

#  === TCP 服务端程序 server.py ===

# 导入socket 库
from socket import *

# 主机地址为空字符串,表示绑定本机所有网络接口ip地址
# 等待客户端来连接
IP = ''
# 端口号
PORT = 50000
# 定义一次从socket缓冲区最多读入512个字节数据
BUFLEN = 512

# 实例化一个socket对象
# 参数 AF_INET 表示该socket网络层使用IP协议
# 参数 SOCK_STREAM 表示该socket传输层使用TCP协议
listenSocket = socket(AF_INET, SOCK_STREAM)

# socket绑定地址和端口
listenSocket.bind((IP, PORT))


# 使socket处于监听状态,等待客户端的连接请求
# 参数 8 表示 最多接受多少个等待连接的客户端
listenSocket.listen(8)
print(f'服务端启动成功,在{PORT}端口等待客户端连接...')

dataSocket, addr = listenSocket.accept()
print('接受一个客户端连接:', addr)

while True:
    # 尝试读取对方发送的消息
    # BUFLEN 指定从接收缓冲里最多读取多少字节
    recved = dataSocket.recv(BUFLEN)

    # 如果返回空bytes,表示对方关闭了连接
    # 退出循环,结束消息收发
    if not recved:
        break

    # 读取的字节数据是bytes类型,需要解码为字符串
    info = recved.decode()
    print(f'收到对方信息: {info}')

    # 发送的数据类型必须是bytes,所以要编码
    dataSocket.send(f'服务端接收到了信息 {info}'.encode())

# 服务端也调用close()关闭socket
dataSocket.close()
listenSocket.close()

The following is the TCP client program client.py

#  === TCP 客户端程序 client.py ===

from socket import *

IP = '127.0.0.1'
SERVER_PORT = 50000
BUFLEN = 1024

# 实例化一个socket对象,指明协议
dataSocket = socket(AF_INET, SOCK_STREAM)

# 连接服务端socket
dataSocket.connect((IP, SERVER_PORT))

while True:
    # 从终端读入用户输入的字符串
    toSend = input('>>> ')
    if  toSend =='exit':
        break
    # 发送消息,也要编码为 bytes
    dataSocket.send(toSend.encode())

    # 等待接收服务端的消息
    recved = dataSocket.recv(BUFLEN)
    # 如果返回空bytes,表示对方关闭了连接
    if not recved:
        break
    # 打印读取的信息
    print(recved.decode())

dataSocket.close()

Application message format

Why define the message format

In the above example, the message we sent is the content to be delivered. Such as strings.

In fact, the program communication we develop in the enterprise often  has . The format definition of the message can be classified into the OSI network model . 格式定义 表示层 

For example: A defined message includes a message header and a message body.

The message header stores the format data of the message, such as the length, type, status, etc. of the message, while the message body stores the specific transmission data.

For programs that use the TCP protocol to transmit information, the format definition must be clearly defined  消息的边界 .

Because the TCP protocol transmits  字节流(bytes stream), if the boundary or length is not specified in the message, the receiver does not know where a complete message starts and ends in the byte stream.

There are two ways to specify the boundaries of a message:

  • Use a special byte as the end of the message

Byte strings that cannot appear in the message content (for example  FFFFFF) can be used as the end character of the message.

  • At a certain position at the beginning of the message, directly specify the length of the message

For example, 2 bytes are used at the top of a message to indicate the length of the message.

The UDP protocol usually does not need to specify the message boundary, because UDP is a datagram protocol, and what the application receives from the socket must be the complete message sent by the sender.

Example 1

We are now going to develop a laboratory workstation monitoring system, including

  • The data collector RUS installed on the computer room workstation

    This program acts as a TCP server to obtain resource usage data, referred to as RUS (Resource Usage Stat)

  • Management console AT installed in the monitoring room

    This program acts as a TCP client to display resource usage data to the administrator, referred to as AT (Admin Terminal)

  • As the designer of this system, you can design the data transmission specification between RUS and AT by yourself, including the message data format specification.

The following is a reference specification:

  • Communication between AT and RUS using TCP long connection

    If the connection is disconnected in the middle, AT must reconnect as a TCP client

  • overall message

    Each message is a UTF8 encoded string

    It consists of message header and message body.

    The message header and the message body are separated by a newline character (the byte after UTF8 encoding is  0A ).

    There are the following types of messages:

    • control commands

      It is sent by AT to RUS to issue management control commands.

      for example:

      • pause Pause data acquisition
      • resume Resume data collection

      After receiving the control command, the RUS must complete the operation and then must reply with a response message, telling AT that the command has been received and completed

    • Data reporting

      It is sent by RUS to AT to report the collected resource data. After the AT receives the data, it should reply a response message of receiving the report.

  • header

    The message header contains only one piece of information: the length of the message body

    The message header uses a decimal string to represent the length of an integer

  • message body

    The message body uses a string in json format to represent the data information, as follows

    • Data reporting RUS -> AT
    {
        "type" : "report",
        "info" : {
            "CPU Usage" : "30%",
            "Mem usage" : "53%"
        }
    }
    
    • Data reporting response AT -> RUS
    {
        "type" : "report-ack"
    }
    
    • Suspend data reporting command AT -> RUS
    {
        "type" : "pause",
        "duration" :  200
    }
    

    Among them, duration indicates the time to suspend reporting, in seconds

    • Restore data report command AT -> RUS
    {
        "type" : "resume"
    }
    
    • Command processing response RUS -> AT
    {
        "type" : "cmd-ack",
        "code" :  200,
        "info" : "处理成功"
    }
    

    Where code is the processing result code, 200 means success. info is the text description of the processing result.

Example 2

In the example 1, the reference interface we give, the transmitted messages are placed in a large string, and then the string is encoded into a byte string for transmission.

The advantage of this interface design is that it is simple and convenient for byte encoding operations when sending: the message header and message body are encoded in UTF8 respectively, and then the byte strings can be concatenated

The processing of the receiving side is also simple, directly separate the message header and message body, and decode UTF8 separately.

We design the communication between common applications so that it is good, simple is beautiful, easy to develop, easy to maintain.

However, if the message interface is  秒理万机 for communication between computing nodes, the disadvantages of such an interface are exposed: the message is long, and the encoding and decoding consumes a lot of processor resources.

A typical example is communication equipment, such as a service processing node of a 4G core network. They often have to process tens of thousands of authentication, authentication, billing and other messages per second, and the above method will bring a huge burden to the equipment.

First of all, the data is represented by characters, which is actually a waste of bandwidth.

For example, if the return code is represented by a string such as 200, it will consume 3 bytes and 24 bits. If the processing result is only successful and unsuccessful, only 1 bit is needed, 1 means successful, 0 means unsuccessful

Secondly, the encoding and decoding algorithm of complex syntax like json requires program code to perform various complex processing (refer to the code of the Python json built-in library), which consumes CPU resources.

A simpler data representation can be defined, such as this:

  • The first 2 bytes of the message header indicate the length of the message

  • The third byte of the message header indicates the message type:

    0: pause command, 1: resume command 2: command response 3: statistics report 4: statistics report response

  • message body data definition

    A definition method similar to Radius/Diameter Attribute-Value Pairs (AVP) can be used

    Attribute: Use one byte to indicate the type of data.

    for example

    1: CPU usage 2: Memory usage

    Length: Use one byte to indicate the length of the information

    Value: Indicates a specific data value

    Thus, the previous example information

     {
            "CPU Usage" : "30%",
            "Mem usage" : "53%"
     }  
    

    in

    "CPU Usage" : "30%" , expressed in hexadecimal bytes like this 01011E

    "Mem usage" : "53%" , expressed in hexadecimal bytes like this 020135

    put together is 01011E020135

In contrast, the first encoding method

Advantages: save more transmission bandwidth, and encode and decode data more efficiently

Disadvantages: poor readability for humans, poor data representation flexibility;

Support for multiple TCP clients

The above server code can only communicate with one client.

If we run multiple clients at the same time, we will find that the following client programs cannot connect to the server successfully. why?

Because, the server program must constantly call the accept() method on the listening socket object in order to continuously accept new client connection requests.

Moreover, additional code needs to be run to send and receive data to multiple data transmission socket objects returned after multiple clients are connected.

Obviously, our program above has no such processing.

Because the socket created by default is  yes, if there is no client connection when the accpet call is made, the program will be blocked here, and no subsequent code will be executed. 阻塞式

Similarly, calling the recv method will also block if there is no data in the receiving buffer of this socket.

Therefore, usually in a thread, it is impossible to continuously call the accept method of the listening socket, and at the same time, it can also be responsible for sending and receiving multiple data transmission socket messages.

So let a server program and multiple clients connect and communicate at the same time?

If one thread does not work, use multiple threads .

Modify the server code as follows

#  === TCP 服务端程序 server.py , 支持多客户端 ===

# 导入socket 库
from socket import *
from threading import Thread

IP = ''
PORT = 50000
BUFLEN = 512

# 这是新线程执行的函数,每个线程负责和一个客户端进行通信
def clientHandler(dataSocket,addr):
    while True:
        recved = dataSocket.recv(BUFLEN)
        # 当对方关闭连接的时候,返回空字符串
        if not recved:
            print(f'客户端{addr} 关闭了连接' )
            break

        # 读取的字节数据是bytes类型,需要解码为字符串
        info = recved.decode()
        print(f'收到{addr}信息: {info}')

        dataSocket.send(f'服务端接收到了信息 {info}'.encode())

    dataSocket.close()

# 实例化一个socket对象 用来监听客户端连接请求
listenSocket = socket(AF_INET, SOCK_STREAM)

# socket绑定地址和端口
listenSocket.bind((IP, PORT))

listenSocket.listen(8)
print(f'服务端启动成功,在{PORT}端口等待客户端连接...')

while True:
   # 在循环中,一直接受新的连接请求
   dataSocket, addr = listenSocket.accept()     # Establish connection with client.
   addr = str(addr)
   print(f'一个客户端 {addr} 连接成功' )

   # 创建新线程处理和这个客户端的消息收发
   th = Thread(target=clientHandler,args=(dataSocket,addr))
   th.start()

listenSocket.close()

The multi-threaded approach has a disadvantage.

If a server needs to handle a large number of client connections at the same time, such as 10,000, it needs to create 10,000 threads.

It is usually impossible for the operating system to allocate so many threads to a process.

In fact, our server program is idle most of the time, waiting for connection requests and receiving messages, and does not need so many threads to process.

This kind of program is usually called an IO bound program , which means that the main time of the program is spent on IO.

This program, in fact, a thread is enough.

The key problem is that this thread needs to allocate time well. When a connection request arrives, execute the code for processing the connection request, and when a message arrives in the socket buffer, execute the code for reading and processing the message.

This processing method is called asynchronous IO .

Python 3 has added the asyncio library, which we can use to simultaneously process data from multiple clients.

The sample code is as follows:

#  === TCP 服务端程序 server.py 异步支持多客户端 ===
import asyncio, socket
IP = ''
PORT = 50000
BUFLEN = 512

# 定义处理数据收发的回调
async def handle_echo(reader, writer):
    addr = writer.get_extra_info('peername')
    while True:
        data = await reader.read(100)
        if not data:
            print(f'客户端{addr}关闭了连接')
            writer.close()
            break

        message = data.decode()
        print(f'收到{addr}信息: {message}')

        writer.write(data)

loop = asyncio.get_event_loop()
coro = asyncio.start_server(handle_echo, IP, PORT, loop=loop)
server = loop.run_until_complete(coro)

# Serve requests until Ctrl+C is pressed
print('服务端启动成功,在{}端口等待客户端连接...'.format(server.sockets[0].getsockname()[1]))
try:
    loop.run_forever()
except KeyboardInterrupt:
    pass

# Close the server
server.close()
loop.run_until_complete(server.wait_closed())
loop.close()

UDP socket programming

UDP Protocol Features

UDP (User Datagram Protocol) is called User Datagram Protocol in Chinese. Like TCP, it is also a transport layer protocol.

The biggest difference from TCP is:

  1. It is a connectionless protocol

That is to say: you can send messages directly to the address of the other party without establishing a virtual connection in advance.

The address of the communicating party is also composed of an IP address and a port number.

So compared to the TCP protocol, it is simpler and faster.

  1. No message reliability guarantee

If a message transmitted by UDP is lost on the network, it is lost. The UDP protocol itself has no retransmission mechanism.

The bottom layer of the TCP protocol has a mechanism to verify whether the message has arrived, and if it is lost, the sender will retransmit it.

Therefore, if the application either does not care about losing some information, or the application layer implements a set of mechanisms to ensure reliability.

  1. Data messages are sent as separate messages

The information data of the two parties in the TCP protocol communication is like flowing in a pipeline, and there is a clear sequence.

The information sent first by the sending application must be received first by the receiving application.

However, the UDP protocol sends independent messages one by one, and the order received by the receiver application is not necessarily the same as the order sent.

For our application development, one thing we need to pay special attention to is: the system design must determine the application semantics  最大报文长度 .

When coding in this way, an application receiving buffer of a corresponding length can be determined to prevent the situation that only a part is received.

TCP Socket is a streaming protocol. If the application receiving buffer is not large enough and only part of it is received, it doesn't matter. Continue to receive later, and then find the message boundary splicing.

As for the UDP datagram protocol, if UDP Socket only accepts a part of the datagram, the rest of the message will be discarded. The next time you receive, you can only receive the content of the next datagram.

UDP socket programming

The following is a sample code for UDP Socket communication.

Realize the function that the client requests the server to return user information.

The action and name parameters in the client request message specify  请求的目的 and 用户名

client code

import socket,json

BUFF_LEN     = 400                   # 最大报文长度
SERVER_ADDR  = ("127.0.0.1", 18000)  # 指明服务端地址

# 创建 UDP Socket
client_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# 设置socket超时时间,单位:秒
client_socket.settimeout(2)

# 要发送的信息 对象
message = {
    'action' : '获取信息',
    'name' : '白月黑羽'
} 
# 发送出去的信息必须是字节,所以要先序列化,再编码
sendbytes = json.dumps(message).encode('utf8')
client_socket.sendto(sendbytes, SERVER_ADDR)
try:
    recvbytes, server = client_socket.recvfrom(BUFF_LEN)
    # 接收到的信息是字节,所以要解码,再反序列化
    message = json.loads(recvbytes.decode('utf8'))
    print(message)
except socket.timeout:
    print('接收消息超时')

server code

import socket,json

BUFF_LEN = 400    # 最大报文长度
ADDR     = ("", 18000)  # 指明服务端地址,IP地址为空表示本机所有IP

# 创建 UDP Socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# 绑定地址
server_socket.bind(ADDR)

while True:
    try:
        recvbytes, client_addr = server_socket.recvfrom(BUFF_LEN)
    except socket.timeout:
        continue
    
    print(f'来自 {client_addr} 的请求')

    # 接收到的信息是字节,所以要解码,再反序列化
    message = json.loads(recvbytes.decode('utf8'))
    print(message)
    if message['action'] == '获取信息':
        # 可以从数据库的数据源查询 此用户的信息
        username = message['name']

        # 要发送的信息 对象
        message = {
            'action' : '返回信息',
            'info' : f'{username} 的信息是:xxxxxxxx'
        } 
        # 发送出去的信息必须是字节,所以要先序列化,再编码
        sendbytes = json.dumps(message).encode('utf8')
        server_socket.sendto(sendbytes, client_addr)

It can be seen that the socket of the UDP communication server also needs to bind the port number.

However, unlike TCP, the server only needs one socket for communication, and does not need two sockets for monitoring and communication.

The socket of the UDP client usually does not need to specify the bound port number, and the operating system will automatically select a binding for it.

When the UDP Socket is not needed, it can be closed through the close method of the socket object, as shown in the following code.

server_socket.close()

After the socket is closed, the port number bound to the port  will be released and can be used again by the socket binding of this process or other processes

 

Guess you like

Origin blog.csdn.net/weixin_47649808/article/details/126328689