IO synchronous, asynchronous multiplexer

Creative Commons License Copyright: Attribution, allow others to create paper-based, and must distribute paper (based on the original license agreement with the same license Creative Commons )

1. Key Concepts

1.1 synchronous, asynchronous

When a function or method is called, the caller whether the final result. Directly to the final result, that is a synchronous call, do not directly get the final result, that is asynchronous calls.

1.2 blocking, non-blocking

When a function or method call, like whether to return, return immediately call is non-blocking, blocking call is not returned immediately.

Synchronous, asynchronous, and blocking, not related to obstruction, synchronous, asynchronous emphasized that whether the final result, blocking, non-blocking time is emphasized, whether to wait.

Synchronous and asynchronous differences: whether the caller got the final results you want. Synchronization is has to be performed to return the final result; asynchronous is the direct return, but the return is not the final result. The caller can not get the results of this call, the other party is required by the caller to inform the caller's formula, to retrieve the final result. Difference between blocking and non-blocking is whether the caller is also capable of other things. Blocked, the caller can only wait; non-blocking, the caller will go to other busy, I do not always wait.

1.3 operating system knowledge

X86 CPU has four operating levels:

Ring0, can execute privileged instructions, access to all levels of the data, you can access IO devices. Ring3 level, the lowest level, this level can only access the data. Kernel code runs Ring0, user code runs Ring3.

Operating system, kernel independent and runs at a higher privilege level, they reside in the memory space to be protected, all rights to have access to the hardware, this memory is called kernel space (kernel mode)

Common applications running in user space (user mode). The application wants to access some hardware resources are required by the operating system provides system calls, system calls can run in kernel space using privileged instructions, then process into kernel mode operation. System call is completed, the process will return to user mode executing user-space code.

1.4 synchronous IO, asynchronous IO, IO multiplexer

1.4.1 IO two stages

  1. Data preparation phase. Device reads data from the kernel buffer to the kernel space
  2. Copy back to the user space kernel space buffer stage process

It occurs when the IO:

  1. Read kernel data from the IO device
  2. The process of copying data from the kernel

1.5 IO model

1.5.1 Synchronization IO

Synchronous IO model includes blocking IO, non-blocking IO, IO multiplexing

Blocking IO:

Process waits (blocking) until the write is complete. (Full waiting)

Non-blocking IO:

Process calls recvfrom operation, if the IO device is not ready to immediately return ERROR, the process is not blocked. Users can re-initiate system calls (can be polled). If the kernel is ready, it is blocked, then copy the data to the user space.

The first phase of the data is not ready, you can start another busy, and so will look at, to check whether data is ready to process non-blocking. The second stage is blocked, i.e., copy data between the user space and the kernel space is blocked.

IO multiplexing:

The so-called IO multiplexing or simultaneously monitor multiple IO, there is a ready, we do not need to wait to start treatment, improve the ability to handle the IO. select almost all operating system platforms are supported, poll to select the upgrade. epoll, Linux kernel 2.5+ began to support, upgrades and poll to select, based on surveillance, increase the callback mechanism. BSD, Mac platform kqueue, Windows has iocp.

 

To select, for example, will focus on the IO operation told the select function and call blocking process, the kernel "monitor" file descriptor fd attention, any attention is a fd corresponding IO data is ready, select returns. Then use the read copy data to the user process.

Under normal circumstances, select up to monitor 1024 fd, but due to select polling way, when the management of IO much, every time traversing the entire fd, inefficient. epoll fd no upper limit management, and is a callback mechanism, without traversing high efficiency.

Signal driving IO:

IO access in the process, first by sigaction system call, submit a signal processing function that returns immediately, the process is not blocked. When the kernel is ready data, and generates a SIGIO signal is delivered to a signal processing function call recvfrom function may operate in this function data is copied from the kernel space to user space, this process is blocked.

Asynchronous IO: (Note: The callback is done the caller, not the caller)

Process initiated asynchronous IO request, return immediately. IO completion of two phases of the kernel, the kernel sends a signal to the process. Throughout the process, the process can be busy with other, and so good and then come.

 

Linux-aio system calls, the kernel is supported from Second Version 2.6:

1.6 python in the IO multiplexing

 

IO multiplexing:

  • Most operating systems support select and poll
  • Linux2.5 + support epoll
  • BSD, Mac support kqueue
  • Solaris implements / dev / poll
  • WindowsDE IOCP

The python library to select the select, poll system call, this is basically operating systems support. Partially achieved epoll, which is the amount of the underlying IO multiplexing module.

Development options:

  1. Completely cross-platform, using select, poll. But poor performance.
  2. Select support for different operating systems on their own technology, doing so will improve the performance of IO processing.

select a file descriptor data structure maintained, a single process using the upper limit, typically 1024, this linear scan surface data structure, inefficient. the difference between poll and select the internal data structure using a linked list, not the maximum limit, but still have to traverse in order to know which device the ready. epoll, using the event notification mechanism uses a callback mechanism to improve efficiency. select, poll also copy data from kernel space to user space, and epoll a memory to reduce the copy through the kernel space and user space sharing.

1.6.1 selectors library

poython3.4 provides selectors library, advanced IO multiplexing library.

Class hierarchy:

selectors.DefaultSelector returns the current platform, the most effective, highest performance achieved. But did not achieve IOCP under Windows, so the Windows can only degenerate into select.

# 在selects模块源码最下面有如下代码
# Choose the best implementation, roughly:
# epoll|kqueue|devpoll > poll > select.
# select() also can't accept a FD > FD_SETSIZE (usually around 1024)
if 'KqueueSelector' in globals():
    DefaultSelector = KqueueSelector
elif 'EpollSelector' in globals():
    DefaultSelector = EpollSelector
elif 'DevpollSelector' in globals():
    DefaultSelector = DevpollSelector
elif 'PollSelector' in globals():
    DefaultSelector = PollSelector
else:
    DefaultSelector = SelectSelector

Event Registration:

class SelectSelector(BaseselctorImpol):
    """Select-based selector."""
    def register(fileobj, events, data=None) -> SelectorKey: 
        pass
  • As a selector register a file object, monitor its IO events returns SelectorKey object.
  • fileobj monitored file object, e.g. socket objects
  • events event, the document object must wait for an event
  • data optional with this opaque data file associated with the object, for example, to store a session associated with each client ID , the associated method. By this parameter so that after the events of interest generated selector do something.

EVENT_READ =  (1 << 0)

EVENT_WRITE =  (1 << 1)

The benefits of this merger is to facilitate the definition of constants

selectors.SelectorKey has four attributes:

  1. fileobj registration file object
  2. file descriptor fd
  3. Events events wait for the above file descriptor file object
  4. When the registration data associated with the data

 

IO multiplexing achieve TCP Server:

import selectors
import socket


s = selectors.DefaultSelector()  # 1拿到selector

# 准备类文件对象
server = socket.socket()
server.bind(('127.0.0.1', 9997))
server.listen()

# 官方建议采用非阻塞IO
server.setblocking(False)


def accept(sock: socket.socket, mas: int):
    conn, r_address = sock.accept()
    # print(conn)
    # print(r_address)
    print(mas)
    # pass
    conn.setblocking(False)
    key1 = s.register(conn, selectors.EVENT_READ, rec)
    print(key1)


def rec(conn: socket.socket, mas: int):
    print(mas)
    data = conn.recv(1024)
    print(data)

    msg = 'Your msg = {} form {}'.format(data.decode(), conn.getpeername())
    conn.send(msg.encode())


# 2注册关注的类文件对象和其事件们
key = s.register(server, selectors.EVENT_READ, accept)  # socket fileobject
print(key)

while True:
    events = s.select()  # epoll select,默认是阻塞的
    # 当你注册时的文件对象们,这其中的至少一个对象关注的事件就绪了,就不阻塞了
    print(events)  # 获得了就绪的对象们,包括就绪的事件,还会返回data

    for key, mask in events:  # event =>key, mask
        # 每一个event都是某一个被观察的就绪的对象
        print(type(key), type(mask))   # key, mask
        # <class 'selectors.SelectorKey'> <class 'int'>
        print(key.data)
        # <function accept at 0x0000000001EA3A60>
        key.data(key.fileobj, mask)  # mask为掩码

server.close()
s.close()

 

IO multiplexing achieve group chat:

# IO多路复用,实现TCP版本的群聊
import socket
import threading
import selectors
import logging


FORMAT = "%(threadName)s %(thread)d %(message)s"
logging.basicConfig(format=FORMAT, level=logging.INFO)


class ChatServer:

    def __init__(self, ip='127.0.0.1', port=9992):
        self.sock = socket.socket()
        self.address = ip, port
        self.event = threading.Event()

        self.selector = selectors.DefaultSelector()

    def start(self):
        self.sock.bind(self.address)
        self.sock.listen()
        self.sock.setblocking(False)

        key = self.selector.register(self.sock, selectors.EVENT_READ, self.accept)  # 只有一个
        logging.info(key)  # 只有一个
        # self.accept_key = key
        # self.accept_fd = key.fd

        threading.Thread(target=self.select, name='select', daemon=True).start()

    def select(self):

        while not self.event.is_set():
            events = self.selector.select()  # 阻塞
            for key, _ in events:
                key.data(key.fileobj)  # select线程

    def accept(self, sock: socket.socket):  # 在select线程中运行的
        new_sock, r_address = sock.accept()
        new_sock.setblocking(False)
        print('~' * 30)

        key = self.selector.register(new_sock, selectors.EVENT_READ, self.rec)  # 有n个
        logging.info(key)

    def rec(self, conn: socket.socket):  # 在select线程中运行的
        data = conn.recv(1024)
        logging.info(data.decode(encoding='cp936'))

        if data.strip() == b'quit' or data.strip() == b'':
            self.selector.unregister(conn)  # 关闭之前,注销,理解为之前的从字典中移除socket对象
            conn.close()
            return

        for key in self.selector.get_map().values():
            s = key.fileobj
            # if key.fileobj is self.sock:  # 方法一
            #     continue
            # if key == self.accept_key:  # 方法二
            #     continue
            # if key.fd == self.accept_fd:  # 方法三
            #     continue
            # msg = 'Your msg = {} form {}'.format(data.decode(encoding='cp936'), conn.getpeername())
            # s.send(msg.encode(encoding='cp936'))
            # print(key.data)
            # print(self.rec)
            # print(1, key.data is self.rec)  # False
            # print(2, key.data == self.rec)  # True
            if key.data == self.rec:  # 方法四
                msg = 'Your msg = {} form {}'.format(data.decode(encoding='cp936'), conn.getpeername())
                s.send(msg.encode(encoding='cp936'))

    def stop(self):  # 在主线程中运行的
        self.event.set()
        fs = set()
        for k in self.selector.get_map().values():
            fs.add(k.fileobj)
        for f in fs:
            self.selector.unregister(f)  # 相当于以前的释放资源
            f.close()
        self.selector.close()


if __name__ == "__main__":
    cs = ChatServer()
    cs.start()

    while True:
        cmd = input(">>>").strip()
        if cmd == 'quit':
            cs.stop()
            break
        logging.info(threading.enumerate())
        logging.info(list(cs.selector.get_map().keys()))
        # for fd, ke in cs.selector.get_map().items():
        #     logging.info(fd)
        #     print(ke)
        #     print()

 

to sum up:

Use IO multiplexing + (select, epoll) is not necessarily better than multi-threaded + synchronous blocking performance, and its biggest advantage is that can handle more connections. Multithreading + synchronous blocking IO mode, open up too many threads, the thread open, destroy overhead or large, it touches you can use the thread pool; multi-thread, the thread of memory they use is also very impressive, multi-thread switch, to protect the site and recovery site, too many threads, switch back to take up a lot of time.

Less connected, multi-thread synchronous blocking IO + mode is appropriate, efficiency is not low. If the connection is very much on the service side is, concurrent IO is still relatively high, this time to open up a lot of thread is not in fact a good deal, this time IO multiplexing may be a better choice.

 

 

Guess you like

Origin blog.csdn.net/sqsltr/article/details/92762279
Recommended