IO multiplexing of Python full stack road series

io-multiplexing-01

What is IO Multiplexing?

IO multiplexing means that as soon as the kernel finds that one or more IO conditions specified by the process are ready to be read, it notifies the process.

for example

You are a teacher (thread), you are in class (start thread), this class is a self-study class, students are studying by themselves, you are also sitting in the classroom, just watching these students, doing nothing (sleep state) ), in the middle of the course, classmate A (socket) suddenly had diarrhea, raised his hand and said, "I want to go to the toilet (read) when I'm wet, and then you let him go. After a while, classmate B (socket) is doing self-study. There is a question in the process that she doesn't quite understand, so please go over and help her answer it (write), and then you go over and help him answer it.

The above situation is IO multiplexing. You are an IO. Then you solve the problem of classmate A and the problem of classmate B. This is multiplexing, and multiple network connections multiplex one IO thread.

Compared with multi-process and multi-thread technology, the biggest advantage of I/O multiplexing technology is that the system overhead is small, the system does not need to create processes/threads, nor does it need to maintain these processes/threads, thus greatly reducing the system overhead.

At present, the common system calls that support I/O multiplexing are select, poll, epoll, and I/O multiplexing is a mechanism through which a process can monitor multiple descriptors. Once a descriptor is ready (generally read-ready or write-ready), can notify the program to perform the corresponding read and write operations

What is a select?

The file descriptors monitored by select are divided into three categories, namely writefds, readfds and exceptfds. After the program starts, the select function will block until a descriptor is ready (with data readable, writable, or with except), or timeout (timeout specifies Waiting time, if it returns immediately, it can be set to null), the function returns, when the select function returns, you can find the ready descriptor by traversing the fdset.

  • Features
  1. The biggest flaw of select is that the FD opened by a single process has a certain limit, which is set by FD_SETSIZE, and the default value is 1024;
  2. When scanning the socket, it is a linear scan, that is, the polling method is used, which is less efficient;
  3. It is necessary to maintain a data structure for storing a large number of fds, which will make the user space and kernel space copy the structure when passing the overhead;

Python implements select model code

#!/usr/bin/env python
# _*_coding:utf-8 _*_

import select
import socket

sk1 = socket.socket()
sk1.bind(('127.0.0.1', 8002, ))
sk1.listen()

demo_li = [sk1]
outputs = []
message_dict = {}

while True:
    r_list, w_list, e_list = select.select(sk1, outputs, [], 1)

    print(len(demo_li),r_list)

    for sk1_or_conn in r_list:

        if sk1_or_conn == sk1:
            conn, address = sk1_or_conn.accept()
            demo_li.append(conn)
            message_dict[conn] = []
        else:
            try:
                data_bytes = sk1_or_conn.recv(1024)
                # data_str = str(data_bytes, encoding="utf-8")
                # print(data_str)
                # sk1_or_conn.sendall(bytes(data_str+"good", encoding="utf-8"))
            except Exception as e:
                demo_li.remove(sk1_or_conn)
            else:
                data_str = str(data_bytes, encoding="utf-8")
                message_dict[sk1_or_conn].append(data_str)
                outputs.append(sk1_or_conn)

    for conn in w_list:
        recv_str = message_dict[conn][0]
        del message_dict[conn][0]
        conn.sendall(bytes(recv_str+"Good", encoding="utf-8"))
        outputs.remove(conn)

What is a poll?

Fundamental:

Poll is essentially no different from select. It copies the array passed in by the user to the kernel space, and then queries the device status corresponding to each fd. If the device is ready, it adds an item to the device waiting queue and continues to traverse. If no ready device is found after fd, the current process is suspended until the device is ready or actively times out. After being woken up, it will traverse fd again. This process has gone through many unnecessary traversals.

It does not have a limit on the maximum number of connections, because it is stored based on a linked list, but it also has a disadvantage:

  1. Large arrays of fds are copied in their entirety between userland and the kernel address space, regardless of whether such copying makes sense.
  2. Another feature of poll is "horizontal triggering". If the fd is reported and not processed, the fd will be reported again in the next poll.

What is a epoll?

Poll was proposed in the 2.6 kernel and is an enhanced version of the previous select and poll. Compared with select and poll, epoll is more flexible and has no descriptor restrictions. epoll uses one file descriptor to manage multiple descriptors, and stores the events of user-related file descriptors in an event table of the kernel, so that the copy in user space and kernel space only needs to be done once.

Fundamental:

epoll supports horizontal triggering and edge triggering. The biggest feature is edge triggering. It only tells the process which fds have just become ready, and it will only be notified once. Another feature is that epoll uses the "event" ready notification method to register the fd through epoll_ctl. Once the fd is ready, the kernel will use a callback-like callback mechanism to activate the fd, and epoll_wait can receive the notification.

Advantages of epoll:

  1. There is no limit on the maximum concurrent connection, and the upper limit of the FD that can be opened is much greater than 1024 (about 100,000 ports can be monitored on 1G of memory).
  2. The efficiency improvement is not the polling method, and the efficiency will not decrease as the number of FDs increases. Only active and available FDs will call the callback function; that is, the biggest advantage of Epoll is that it only cares about your "active" connections, regardless of the total number of connections, so in the actual network environment, Epoll's efficiency will be much higher than select and poll.
  3. Memory copy, use mmap() file to map memory to speed up message passing with kernel space; that is, epoll uses mmap to reduce copy overhead.

There are two modes for epoll to operate on file descriptors: LT (level trigger) and ET (edge ​​trigger). LT mode is the default mode. The differences between LT mode and ET mode are as follows:

LT mode: When epoll_wait detects the occurrence of a descriptor event and notifies the application of this event, the application may not process the event immediately. The next time epoll_wait is called, the application will respond again and be notified of this event.

ET mode: When epoll_wait detects the occurrence of a descriptor event and notifies the application of this event, the application must process the event immediately. If it is not handled, the next time epoll_wait is called, the application will not respond again and be notified of this event.

LT mode LT (level triggered) is the default way of working and supports both block and no-block sockets. In this approach, the kernel tells you if a file descriptor is ready, and you can then perform IO operations on the ready fd. If you do nothing, the kernel will continue to notify you.

ET mode ET (edge-triggered) is a high-speed working mode that only supports no-block sockets. In this mode, the kernel tells you via epoll when a descriptor becomes ready from never ready. Then it will assume that you know the file descriptor is ready, and will not send more ready notifications for that file descriptor until you do something that causes that file descriptor to be no longer ready (e.g. you An EWOULDBLOCK error was caused when sending, receiving, or receiving a request, or sending or receiving less than a certain amount of data). Note, however, that the kernel will not send more notifications (only once) if the fd has not been IOed (thus causing it to become unready again). The ET mode greatly reduces the number of times the epoll event is repeatedly triggered, so the efficiency is higher than that of the LT mode. When epoll works in ET mode, non-blocking sockets must be used to avoid starving the task of processing multiple file descriptors due to blocking read/blocking write operations on a file handle.

In select/poll, the kernel scans all monitored file descriptors only after the process calls a certain method, and epoll registers a file descriptor through epoll_ctl() in advance, once it is ready based on a file descriptor , the kernel will use a callback-like callback mechanism to activate the file descriptor quickly, and be notified when the process calls epoll_wait(). (The mechanism of traversing file descriptors is removed here, but the mechanism of listening for callbacks. This is the charm of epoll.)

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325938724&siteId=291194637