On the Python-IO multiplexing (select, poll, epoll mode)

1. What is the IO multiplexing

  In a conventional socket communication, there are two basic modes,

  The first is a synchronous blocking IO, its thread will be suspended in the event of IO operations until the data is copied from kernel space to user space will stop, because CPython, many socket-related functions are with the kernel function ( system call) are closely related, such as with the ioctl fctl, then the use of this model will present low CPU resource utilization, the specific mode is as follows:

  

  The second mode is an asynchronous non-blocking IO (asynchronous: when faced with a non-blocking IO operation returns immediately: the thread will not be suspended), which is a model polling way, when you call the Windows Sockets API function, call the function returns immediately. In most cases, these function calls will be called "failure" and return WSAEWOULDBLOCK error code. DESCRIPTION requested operation did not complete within the duration of the call. Usually, the application calls this function needs to be repeated until a successful return code. Which mode is as follows:

  

  In fact, compared to two or more IO mode, non-blocking asynchronous IO and require more error exception processing, but for some time the transceiver is not fixed, the amount of data send and receive non-uniform, large number of connections, the higher performance or of.

 

  So how to more efficiently handle multiple network connections in a single-process environment it? The answer is to use multiplexed IO model

  I / O multiplexing model will be used in select, poll, epoll functions, these functions also make the process of obstruction, however, and blocking I / O The difference is, these two functions can block multiple simultaneous I / O operations . But also simultaneously to a plurality of read operations , a plurality of write operations of I / O functions for detecting, until there is data to be read or written, it really calls the I / O operation function.

  

  As can be seen by the operating system to manage socket connections instance, when data packets are ready, the operating system library function sends the user indication to the upper program, the program after receiving only for IO operations, and returns a success flag, it can be summarized as two call times, twice returned.

 

2. select、poll、epoll

  select: 

  系统库函数:int select(int maxfdpl, fd_set * readset, fd_set *writeset, fd_set *exceptset, const struct timeval * tiomeout)

  注:单个进程能够监听端口的最大数量在/proc/sys/fs/file-max中可以查看,32位机默认1024,64位机默认2048.

  select本质上是通过设置或者检查存放fd标志位的数据结构来进行下一步处理。这样所带来的缺点是:

  1、 单个进程可监视的fd数量被限制,即能监听端口的大小有限。 一般来说这个数目和系统内存关系很大,具体数目可以cat /proc/sys/fs/file-max察看。32位机默认是1024个。64位机默认是2048.

  2、 对socket进行扫描时是线性扫描,即采用轮询的方法,效率较低:当套接字比较多的时候,每次select()都要通过遍历FD_SETSIZE个Socket来完成调度,不管哪个Socket是活跃的,都遍历一遍。这会浪费很多CPU时间。如果能给套接字注册某个回调函数,当他们活跃时,自动完成相关操作,那就避免了轮询,这正是epoll与kqueue做的。

  3、需要维护一个用来存放大量fd的数据结构,这样会使得用户空间和内核空间在传递该结构时复制开销大

   

  poll:

  poll本质上和select没有区别,它将用户传入的数组拷贝到内核空间,然后查询每个fd对应的设备状态,如果设备就绪则在设备等待队列中加入一项并继续遍历,如果遍历完所有fd后没有发现就绪设备,则挂起当前进程,直到设备就绪或者主动超时,被唤醒后它又要再次遍历fd。这个过程经历了多次无谓的遍历。

  它没有最大连接数的限制,原因是它是基于链表来存储的,但是同样有一个缺点:

  1、大量的fd的数组被整体复制于用户态和内核地址空间之间,而不管这样的复制是不是有意义。                                                                                                                                     

  2、poll还有一个特点是“水平触发”,如果报告了fd后,没有被处理,那么下次poll时会再次报告该fd。

 

  epoll:  

  epoll支持水平触发和边缘触发,最大的特点在于边缘触发,它只告诉进程哪些fd刚刚变为就需态,并且只会通知一次。还有一个特点是,epoll使用“事件”的就绪通知方式,通过epoll_ctl注册fd,一旦该fd就绪,内核就会采用类似callback的回调机制来激活该fd,epoll_wait便可以收到通知

  epoll的优点:

  1、没有最大并发连接的限制,能打开的FD的上限远大于1024(1G的内存上能监听约10万个端口);

  2、效率提升,不是轮询的方式,不会随着FD数目的增加效率下降。只有活跃可用的FD才会调用callback函数;即Epoll最大的优点就在于它只管你“活跃”的连接,而跟连接总数无关,因此在实际的网络环境中,Epoll的效率就会远远高于select和poll。

  3、 内存拷贝,利用mmap()文件映射内存加速与内核空间的消息传递;即epoll使用mmap减少复制开销。
 
 
  
   select、poll、epoll 区别总结:(来自:https://blog.csdn.net/u013408431/article/details/67632468#t3

  1、支持一个进程所能打开的最大连接数

select

单个进程所能打开的最大连接数有FD_SETSIZE宏定义,其大小是32个整数的大小(在32位的机器上,大小就是32*32,同理64位机器上FD_SETSIZE为32*64),当然我们可以对进行修改,然后重新编译内核,但是性能可能会受到影响,这需要进一步的测试。

poll

poll本质上和select没有区别,但是它没有最大连接数的限制,原因是它是基于链表来存储的

epoll

虽然连接数有上限,但是很大,1G内存的机器上可以打开10万左右的连接,2G内存的机器可以打开20万左右的连接

2、FD剧增后带来的IO效率问题

select

因为每次调用时都会对连接进行线性遍历,所以随着FD的增加会造成遍历速度慢的“线性下降性能问题”。

poll

同上

epoll

因为epoll内核中实现是根据每个fd上的callback函数来实现的,只有活跃的socket才会主动调用callback,所以在活跃socket较少的情况下,使用epoll没有前面两者的线性下降的性能问题,但是所有socket都很活跃的情况下,可能会有性能问题。

3、 消息传递方式

select

内核需要将消息传递到用户空间,都需要内核拷贝动作

poll

同上

epoll

epoll通过内核和用户空间共享一块内存来实现的。

 

3. select实现c/s通信

  

  服务器端:(在写队列中,调用Queue对象get_nowait方法时,可能会抛出Queue.Empty的异常,需要做异常处理)

import socket
# import threading
import select
import queue

HOST, PORT = "localhost", 8020
address = (HOST, PORT)

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(address)
server.listen(10)
print("server is listening...")

# server套接字处理连接,其余套接字处理读操作
inputs = [server, ]
outputs = []
exceptions = []
# 消息接收
msg = {}

def handle_read(readable:list):
    """处理socket新建连接及数据读入"""
    for read_socket in readable:
        if read_socket is server:
            # 新建socket连接(有新用户加入)
            sock, addr = read_socket.accept()
            print("({}):connect successfully...".format(addr))
            sock.setblocking(False)
            inputs.append(sock)
            msg[sock] = queue.Queue()
        else:
            # 已建立连接的socket有消息接收
            # 此时该socket实例已被添加,直接收数据
            data = read_socket.recv(1024)
            if data:
                print("({0}) message: {1}".format(read_socket.getpeername(), data.decode("utf8")))
                # 将消息压入消息队列中
                msg[read_socket].put(data)
                if read_socket not in outputs:
                    outputs.append(read_socket)
            else:
                # socket断开连接
                print("({0}):close successfully...".format(read_socket.getpeername()))

                # 清空消息发送队列,以及输入输出队列
                inputs.remove(read_socket)
                if read_socket in outputs:
                    outputs.remove(read_socket)
                read_socket.close()
                del msg[read_socket]


def handle_write(writable: list):
    """处理socket消息发送"""
    for write_socket in writable:
        # get_nowait可能出现queue.Empty异常
        try:
            cur_writable_queue = msg.get(write_socket, None)
            if cur_writable_queue:
                # 有消息则却出消息并转发
                cur_w_data = cur_writable_queue.get_nowait()
                write_socket.send(cur_w_data)
            else:
                # 没有消息,则退出
                outputs.remove(write_socket)
        except queue.Empty:
            pass


def handle_exception(exceptional:list):
    """处理异常"""
    for e in exceptional:
        print("({0}) connect failed...".format(e.getpeername()))
        inputs.remove(e)
        if e in outputs:
            outputs.remove(e)
        if msg.get(e, None):
            del msg[e]
        e.close()


# server存在则循环监听, 事件循环的方式
while inputs:
    # 开启select监听
    readable, writable, exceptional = select.select(inputs, outputs, exceptions)
    handle_read(readable)
    handle_write(writable)
    handle_exception(exceptional)

  客户端(异步非阻塞IO):

import socket

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.setblocking(0)

try:
    client.connect(("localhost", 8020))
except BlockingIOError:
    pass

while True:
    response = input("回复服务器:").encode("utf8")
    client.send(response)
    if response=="exit":
        break

    # 非阻塞I/O轮询方式
    while True:
        try:
            data = client.recv(1024)
        except BlockingIOError as e:
            pass
        else:
            if data:
                data = data.decode("utf8")
                break

    print("收到来自服务器的消息:%s" % data)

client.close()

  运行结果:

  服务器与第一个客户端建立连接

  

  服务器与第一个客户端通信:

  

  

  服务器与第二个客户端通信:

  

  

  

 

4. 使用DefaultSelector自适应操作系统默认IO多路复用模式

 

 

Guess you like

Origin www.cnblogs.com/kisun168/p/11279287.html