python Series - (select, poll, epoll)

When operating the select function has a set of requirements, either the set is a descriptor itself, or he a the fileno () interface returns a descriptor .

I / O multiplexing multithreading is implemented in the single thread mode effect, to achieve a multi-effect concurrent I O /. Look at a simple socket example:

Server:

import socket  
  
SOCKET_FAMILY = socket.AF_INET  
SOCKET_TYPE = socket.SOCK_STREAM  
  
sockServer = socket.socket(SOCKET_FAMILY, SOCKET_TYPE)  
sockServer.bind(('0.0.0.0', 8888))  
sockServer.listen(5)  
  
while True:  
    cliobj, addr = sockServer.accept()  
    while True:  
        recvdata = cliobj.recv(1024)  
        if recvdata:  
            print(recvdata.decode())  
        else:  
            cliobj.close()  
            break

Client:

import socket  
  
socCli = socket.socket()  
socCli.connect(('127.0.0.1', 8888))  
while True:  
    data = input("input str:")  
    socCli.send(data.encode())

Transmitting a more simple example of a client to a server of the information input socket communication, in the above example, the server is a single-threaded, the blocking mode. How to achieve multi-client connected to it, we can use the multi-threaded mode, this is of course no problem. The use of multi-threading, blocking socket to deal with the case, the code is straightforward, but there are also many defects. It is difficult to ensure that the threads share resources without problems. And this style of programming in the computer program above lower efficiency is only one CPU. But if a situation with limited user open thread, such as 1024. When the first 1025 client connections are still blocked.
Is there a better way of doing things, of course, one is to use asynchronous socket. This socket is only triggered when some of the event will be blocked. On the contrary, the program asynchronous socket above perform an action, it will immediately be notified if the action was successful. Based on this information the program will decide how to continue the following operations due to the asynchronous socket is non-blocking, there is no need to use multiple threads again. All the work can be done in a thread. This single-threaded model has its own challenges, but it can be a good choice for many programs. It can also be combined with the use of multi-threading: single-threaded using asynchronous socket server for the network portion of the processing, multi-threaded blocking can be used to access other resources, such as databases. 2.6 Linux kernel has a number of mechanisms to manage asynchronous socket, a corresponding three of the Python API: select, poll and epoll. epoll and pool better than select because the Python program does not require inspection event of interest to each socket. Instead, it can rely on the operating system to tell it which socket may have these event. epoll better than the pool, because it does not require the operating system to check every socket python event of interest to all program needs. But when Linux will take place in the event to track and returns a Python list when needed. Thus epoll for large numbers (thousands) concurrent socket connections, is more efficient and scalable mechanism for
asynchronous I / O processing model

select first appeared in 1983, in 4.2BSD, which monitors an array of multiple file descriptors by a select () system call, when the select () returns the file descriptor array will be ready to modify the kernel logo position, so that the process can thereby be obtained these file descriptors for subsequent read and write operations.
select now almost on all platforms supported by its good cross-platform support is also one of its advantages, in fact, from now seems that this is one of the advantages of its remaining few.
One drawback is that there is a maximum limit select a single process can monitor the number of file descriptors, typically 1024, but may even recompile the kernel macro definitions way to enhance this limit by modifying on Linux.
Further, SELECT () data structure maintained by a large number of file descriptors stored, with increasing number of file descriptors, the replication of the overhead linear growth. At the same time, due to the delay network response time of making a large number of TCP connections in the inactive state, but the call select () will conduct a linear scan all socket, so it is also a waste of a certain amount of overhead
select poll epoll compare

1 Features
select	further processing is carried out by setting or checking the data structure stored on the select flag fd nature. Such a disadvantage is brought about: the number of individual processes. 1 fd can be restricted to monitor 2 need to maintain a large amount fd is used to store data structures, this will make the user space and kernel space to copy a large overhead when transmitting the configuration three pairs of socket when the scanning is linear scan
poll	the poll and select essentially no difference, the user would copy the array passed to the kernel space, and then query the status of each device corresponding to fd, if the device in the apparatus is ready queue and added to a traversal continues, if all traversed no device found fd ready, then suspends the current process until the device is ready or active timeout, wake it up again after traversing fd again. This process has undergone many unnecessary traversal. It does not limit the maximum number of connections, because it is based on a linked list to store, but also has a disadvantage: large array of fd is copied in its entirety between user mode and kernel address space, whether or not such copy is not a significance. poll also characterized by "trigger level", if reported fd, not processed, it will report the fd again next poll.
epoll	epoll support level trigger and edge trigger, the biggest feature is that edge-triggered, it only tells the process which has just become fd would need state and will notify once. On the replication issues we said before, epoll use mmap reduce replication overhead. Another feature is, epoll using the "event" readiness notification way through epoll_ctl registration fd, once the fd ready, the kernel uses a similar callback callback mechanism to activate the fd, epoll_wait will be notified
2 supports a process can open the maximum number of connections
select	Maximum number of connections that can be opened with a single process FD_SETSIZE macro definition, which size is the size of 32 integer (32-bit machine, the size is 32 * 32, similarly to the 64-bit machine FD_SETSIZE 32 * 64), of course, we can modify and recompile the kernel, but performance may be affected, which requires further testing.
poll	And select the poll no essential difference, but it does not limit the maximum number of connections, because it is based on a linked list to store
epoll	Although the number of connections has an upper limit, however great, can open a connection of about 100,000 on the machine 1G memory, 2G memory machines can be connected to open around 200,000
3 FD brings the dramatic increase in the efficiency of IO
select	因为每次调用时都会对连接进行线性遍历，所以随着FD的增加会造成遍历速度慢的“线性下降性能问题”。
poll	同上
epoll	因为epoll内核中实现是根据每个fd上的callback函数来实现的，只有活跃的socket才会主动调用callback，所以在活跃socket较少的情况下，使用epoll没有前面两者的线性下降的性能问题，但是所有socket都很活跃的情况下，可能会有性能问题。
4 消息传递方式
select	内核需要将消息传递到用户空间，都需要内核拷贝动作。
poll	同上
epoll	epoll通过内核和用户空间共享一块内存来实现的。

下面我们对上面的socket例子进行改造，看一下select的例子：

import socket  
import queue  
from select import select  
  
SERVER_IP = ('127.0.0.1', 9999)  
  
# 保存客户端发送过来的消息,将消息放入队列中  
message_queue = {}  
input_list = []  
output_list = []  
  
if __name__ == "__main__":  
    server = socket.socket()  
    server.bind(SERVER_IP)  
    server.listen(10)  
    # 设置为非阻塞  
    server.setblocking(False)  
  
    # 初始化将服务端加入监听列表  
    input_list.append(server)  
  
    while True:  
        # 开始 select 监听,对input_list中的服务端server进行监听  
        stdinput, stdoutput, stderr = select(input_list, output_list, input_list)  
  
        # 循环判断是否有客户端连接进来,当有客户端连接进来时select将触发  
        for obj in stdinput:  
            # 判断当前触发的是不是服务端对象, 当触发的对象是服务端对象时,说明有新客户端连接进来了  
            if obj == server:  
                # 接收客户端的连接, 获取客户端对象和客户端地址信息  
                conn, addr = server.accept()  
                print("Client {0} connected! ".format(addr))  
                # 将客户端对象也加入到监听的列表中, 当客户端发送消息时 select 将触发  
                input_list.append(conn)  
                # 为连接的客户端单独创建一个消息队列，用来保存客户端发送的消息  
                message_queue[conn] = queue.Queue()  
  
            else:  
                # 由于客户端连接进来时服务端接收客户端连接请求，将客户端加入到了监听列表中(input_list)，客户端发送消息将触发  
                # 所以判断是否是客户端对象触发  
                try:  
                    recv_data = obj.recv(1024)  
                    # 客户端未断开  
                    if recv_data:  
                        print("received {0} from client {1}".format(recv_data.decode(), addr))  
                        # 将收到的消息放入到各客户端的消息队列中  
                        message_queue[obj].put(recv_data)  
  
                        # 将回复操作放到output列表中，让select监听  
                        if obj not in output_list:  
                            output_list.append(obj)  
  
                except ConnectionResetError:  
                    # 客户端断开连接了，将客户端的监听从input列表中移除  
                    input_list.remove(obj)  
                    # 移除客户端对象的消息队列  
                    del message_queue[obj]  
                    print("\n[input] Client  {0} disconnected".format(addr))  
  
        # 如果现在没有客户端请求,也没有客户端发送消息时，开始对发送消息列表进行处理，是否需要发送消息  
        for sendobj in output_list:  
            try:  
                # 如果消息队列中有消息,从消息队列中获取要发送的消息  
                if not message_queue[sendobj].empty():  
                    # 从该客户端对象的消息队列中获取要发送的消息  
                    send_data = message_queue[sendobj].get()  
                    sendobj.sendall(send_data)  
                else:  
                    # 将监听移除等待下一次客户端发送消息  
                    output_list.remove(sendobj)  
  
            except ConnectionResetError:  
                # 客户端连接断开了  
                del message_queue[sendobj]  
                output_list.remove(sendobj)  
                print("\n[output] Client  {0} disconnected".format(addr))

epoll实现实例：

#!/usr/bin/env python  
import select  
import socket  
  
response = b''  
  
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  
serversocket.bind(('0.0.0.0', 8080))  
serversocket.listen(1)  
# 因为socket默认是阻塞的，所以需要使用非阻塞（异步）模式。  
serversocket.setblocking(0)  
  
# 创建一个epoll对象  
epoll = select.epoll()  
# 在服务端socket上面注册对读event的关注。一个读event随时会触发服务端socket去接收一个socket连接  
epoll.register(serversocket.fileno(), select.EPOLLIN)  
  
try:  
    # 字典connections映射文件描述符（整数）到其相应的网络连接对象  
    connections = {}  
    requests = {}  
    responses = {}  
    while True:  
        # 查询epoll对象，看是否有任何关注的event被触发。参数“1”表示，我们会等待1秒来看是否有event发生。  
        # 如果有任何我们感兴趣的event发生在这次查询之前，这个查询就会带着这些event的列表立即返回  
        events = epoll.poll(1)  
        # event作为一个序列（fileno，event code）的元组返回。fileno是文件描述符的代名词，始终是一个整数。  
        for fileno, event in events:  
            # 如果是服务端产生event,表示有一个新的连接进来  
            if fileno == serversocket.fileno():  
                connection, address = serversocket.accept()  
                print('client connected:', address)  
                # 设置新的socket为非阻塞模式  
                connection.setblocking(0)  
                # 为新的socket注册对读（EPOLLIN）event的关注  
                epoll.register(connection.fileno(), select.EPOLLIN)  
                connections[connection.fileno()] = connection  
                # 初始化接收的数据  
                requests[connection.fileno()] = b''  
  
            # 如果发生一个读event，就读取从客户端发送过来的新数据  
            elif event & select.EPOLLIN:  
                print("------recvdata---------")  
                # 接收客户端发送过来的数据  
                requests[fileno] += connections[fileno].recv(1024)  
                # 如果客户端退出,关闭客户端连接，取消所有的读和写监听  
                if not requests[fileno]:  
                    connections[fileno].close()  
                    # 删除connections字典中的监听对象  
                    del connections[fileno]  
                    # 删除接收数据字典对应的句柄对象  
                    del requests[connections[fileno]]  
                    print(connections, requests)  
                    epoll.modify(fileno, 0)  
                else:  
                    # 一旦完成请求已收到，就注销对读event的关注，注册对写（EPOLLOUT）event的关注。写event发生的时候，会回复数据给客户端  
                    epoll.modify(fileno, select.EPOLLOUT)  
                    # 打印完整的请求，证明虽然与客户端的通信是交错进行的，但数据可以作为一个整体来组装和处理  
                    print('-' * 40 + '\n' + requests[fileno].decode())  
  
            # 如果一个写event在一个客户端socket上面发生，它会接受新的数据以便发送到客户端  
            elif event & select.EPOLLOUT:  
                print("-------send data---------")  
                # 每次发送一部分响应数据，直到完整的响应数据都已经发送给操作系统等待传输给客户端  
                byteswritten = connections[fileno].send(requests[fileno])  
                requests[fileno] = requests[fileno][byteswritten:]  
                if len(requests[fileno]) == 0:  
                    # 一旦完整的响应数据发送完成，就不再关注写event  
                    epoll.modify(fileno, select.EPOLLIN)  
  
            # HUP（挂起）event表明客户端socket已经断开（即关闭），所以服务端也需要关闭。  
            # 没有必要注册对HUP event的关注。在socket上面，它们总是会被epoll对象注册  
            elif event & select.EPOLLHUP:  
                print("end hup------")  
                # 注销对此socket连接的关注  
                epoll.unregister(fileno)  
                # 关闭socket连接  
                connections[fileno].close()  
                del connections[fileno]  
finally:  
    # 打开的socket连接不需要关闭，因为Python会在程序结束的时候关闭。这里显式关闭是一个好的代码习惯  
    epoll.unregister(serversocket.fileno())  
    epoll.close()  
    serversocket.close()

python Series - (select, poll, epoll)

Guess you like