From IO multiplexing to redis threading model

Unix IO Model Taxonomy

  • Blocking IO - Blocking IO
  • NoneBlocking IO - non-blocking IO
  • IO multiplexing - IO multiplexing
  • signal driven IO - signal driven IO
  • asynchronous IO - asynchronous IO

Blocking IO - Blocking IO

The most traditional IO model, that is, blocking occurs during the process of reading and writing data.

When the user thread sends an IO request, the kernel will check whether the data is ready. If it is not ready, it will wait for the data to be ready, and the user thread will be in a blocked state, and the user thread will hand over the CPU. When the data is ready, the kernel will copy the data to the user thread and return the result to the user thread, and the user thread will release the block state.

Some people may say that multithreading + blocking IO can be used to solve efficiency problems, but because in multithreading + blocking IO, each socket corresponds to a thread, this will cause a lot of resource occupation, and especially for long connections In other words, the resources of the thread will never be released. If there are many connections in succession, it will cause a performance bottleneck.

Non-blocking IO - NoneBlocking IO

When a user thread initiates an IO operation, it does not need to wait, but gets a result immediately. If the result is an error, it knows the data is not ready yet, so it can send the IO operation again. Once the data in the kernel is ready, and it receives the request from the user thread again, it immediately copies the data to the user thread, and then returns.

In the non-blocking IO model, user threads need to constantly ask whether the kernel data is ready, that is to say, non-blocking IO will not hand over the CPU, but will always occupy the CPU.

The difference from blocking IO:

  • Whether to return directly when there is no data
  • Will it hand over the CPU

For non-blocking IO, there is a very serious problem. In the while loop, it is necessary to constantly ask whether the kernel data is ready, which will lead to a very high CPU usage, so in general, the while loop is rarely used to read data.

Non-blocking IO core process:

  • The non-blocking type is mainly reflected when the user process initiates the recvfrom system call. At this time, the system kernel has not received the datagram, and directly returns an error to the user process, telling "there is currently no datagram reachable, come back later"

  • The user process receives the information, but the user process does not know when the datagram is available, so it starts polling (polling) and initiates the recvfrom system call to the system kernel to "ask if the data has come", and if not, continue to return an error

  • The user process polls and initiates the recvfrom system call until the datagram is available. At this time, it needs to wait for the system kernel to copy the datagram to the buffer of the user process. After the copy is completed, a success prompt will be returned.

IO multiplexing - IO multiplexing

IO multiplexing is a synchronous IO model that enables a thread to monitor multiple file handles; once a file handle is ready, it can notify the application to perform corresponding read and write operations; if no file handle is ready, the application will be blocked , hand over the CPU. Multiplexing refers to network connections, and multiplexing refers to the same thread.

Whether it is synchronous blocking or synchronous non-blocking, the improvement of system performance is very small. And through multiplexing, one or a group of threads (thread pool) can handle multiple TCP connections. IO multiplexing uses two system calls (select/poll/epoll and recvfrom), and blocking IO only calls recvfrom. The core of select/poll/epoll is that it can process multiple connections at the same time, not faster, so if the number of connections is not high, the performance may not be better than multithreading + blocking IO.

Due to the disadvantages of non-blocking IO, IO multiplexing appears. In the multiplexed IO model, there will be a kernel thread that continuously polls the status of multiple sockets. Only when the actual read and write events are sent, the actual IO read and write operations are actually invoked. Because in the multiplexing IO model, only one thread can be used to manage multiple sockets, the system does not need to create new processes or threads, and does not need to maintain these threads and processes, and only when there are actually read and write events in progress, IO resources will be used, so it greatly reduces CPU resource usage.

insert image description here

Signal driven IO - signal driven IO

In the signal-driven IO model, when a user thread initiates an IO request operation, a signal function will be registered for the corresponding socket, and then the user thread will continue to execute. When the kernel data is ready, a signal will be sent to the user thread, and the user thread will receive it. After the signal, the IO read and write operation is called in the signal function to perform the actual IO request operation. This is generally used in UDP, and is almost useless for TCP sockets, because the signal is generated too frequently, and the appearance of the signal does not tell us what request happened.

The user process can use the signal method. When the system kernel descriptor is ready, it will send SIGNO to the user space. At this time, it will initiate the recvfrom system call and wait for the success prompt to return. The process is as follows:

  • First enable the signal IO start function of the socket, and pass a signaction system call with a built-in signal processing function, which will return directly after the call is initiated;
  • Secondly, after waiting for the kernel to receive the datagram from the network, send a signal that the current data is available to the user space to the signal processing function;
  • When the signal processing function receives the information, it initiates the recvfrom system call and waits for the kernel data to copy the datagram to the buffer of the user space;
  • After receiving the successful return notification that the copy is complete, the application process can start to read data from the network.

Asynchronous IO - asynchronous IO

The first four IO models are actually synchronous IO, and only the last one is truly asynchronous IO, because whether it is a multiplexed IO or a signal-driven model, the second stage of the IO operation will cause the user thread to block, that is, The process of data copying by the kernel will block the user thread.

  • Defined by the POSIX specification, inform the system kernel to start an operation, and let the kernel notify the user that the process data has been prepared and can be read after the entire operation includes data waiting and the completion of the data copy process;
  • The difference from the above-mentioned signal IO model is that asynchrony notifies us when the IO operation is completed, and signal IO notifies us when an IO operation can be started

Definition of synchronous and asynchronous

  • Synchronization: Initiate a fn call and need to wait for the result of the call to return. The result of the call is either the expected result or the result of an exception thrown. It can be said to be an atomic operation (either succeed or fail to return)
  • Asynchronous: Initiate a fn call and return directly without waiting for the result. Only after the callee executes the handler, the caller is notified to obtain the result by means of "wake-up" (the wake-up method includes callback, event notification, etc.)
  • Summary: Synchronous and asynchronous focus on communication between programs

Definition of blocking and non-blocking

  • Blocking: When there is no data, it will block and wait for data before returning
  • Non-blocking: when there is no data, return no data directly without waiting
  • Summary: Blocking and non-blocking pay more attention to the state of the program waiting for the result
  • It can be seen that there is no correlation between synchronous, asynchronous, blocking, and non-blocking, and the goals of attention are different.

What are the implementations of IO multiplexing

  • select
  • poll
  • epoll

Rough implementation of IO multiplexing

select is a demultiplexing function provided by the kernel, which can avoid the problem of polling and waiting in synchronous non-blocking IO.

insert image description here

The user first adds the socket that needs to perform IO operations to the select, and then blocks and waits for the return of the select system call. When the data arrives, the socket is activated, the select function returns, and the user thread officially initiates a read request, reads the data and continues to execute.

Looking at it this way, this method is not much different from synchronous blocking IO, and there are even additional operations of adding monitoring sockets and calling the select function, which is even less efficient.But after using select, users can process multiple socket IO requests in one thread at the same time, which is its biggest advantage. Users can register multiple sockets, and then continuously call select to read the activated sockets, so that the same thread can process multiple IO requests at the same time. In the synchronous blocking model, this goal must be achieved through multi-threading. Therefore, the design purpose of IO multiplexing is not to be fast, but to solve the pressure on server overhead caused by the excessive number of threads/processes.

Although this method allows multiple IO requests to be processed in a single thread, the process of each IO request is still blocked (blocked on the select function), and the average time is even longer than the synchronous blocking IO model. If the user thread only registers the socket that it is interested in, and then does its own thing, and waits for the data to arrive and process it, the CPU utilization rate can be improved.

insert image description here

Through the Reactor method, the work of polling the IO operation status by the user thread is uniformly handed over to the handle_events event loop for processing. After the user thread registers the event handler, it can continue to perform other work (asynchronously), and the Reactor thread is responsible for calling the kernel's select function to check the socket status. When a socket is activated, the corresponding user thread is notified (or the callback function of the user thread is executed), and the handel_envent is executed to read and process data.

Since the select function is blocked, the multiplexing IO multiplexing model is called the asynchronous blocking IO model, where blocking does not refer to sockets. Because when using IO multiplexing, the socket is set to NONBLOCK, but it does not affect, because when the user initiates an IO request, the data has already arrived, and the user thread will not be blocked.

​ IO multiplexing is the most commonly used IO model, but its asynchrony is not complete because it uses the select system call that blocks threads. Therefore, IO multiplexing can only be called asynchronous blocking IO, not true asynchronous IO.

select

There are three types of file descriptors monitored by the select function, readfds, writefds, and exceptfds. After calling, the function will block until there is a file descriptor ready (data read, write, or except), or timeout (timeout specifies the time, if return immediately set null), the function will return. When the select function returns, the ready descriptor can be found through the convenience fdset.

​ Advantages:good cross-platform

​ Disadvantages:There is a maximum limit on the number of file descriptors that a single process can monitor, which is 1024 on Linux. You can increase this limit by modifying the macro definition or even recompiling the kernel, but this will reduce efficiency. When a file descriptor is ready, it is necessary to traverse the fd collection to obtain the corresponding ready file descriptor.As the number of file descriptors increases, the traversal time will also increase.

poll

Select uses three bitmaps to represent fdset, and poll uses a pollfd pointer. The pollfd structure contains the event to be monitored and the event that occurred,Do not use the select parameter to pass the valueAt the same time, there is no limit to the maximum number of pollfd (but the performance will decrease if the number is too large)Like select, after poll returns, pollfd needs to be polled for ready descriptors.

advantage:No need to pass a set of file descriptors anymoreThe maximum number of file descriptors is no longer limited

shortcoming:Still need to traverse the file descriptor collection

epoll

epoll is an enhanced version of select and poll. Compared with the former two, it is more flexible and has no descriptor restrictions. epoll uses an event table to manage multiple descriptors, and stores the events of the user-related file descriptors in an event table of the kernel, so that the copy between the user space and the kernel space only needs to be done once.

When the user requests, register the request in the event table, and then the user can handle other things. The kernel has a dedicated thread to process the events in the event list. When a file event is ready, it will notify the corresponding application according to the registered event information. The program performs the corresponding operation event. This solves the blocking problem of the sock request at the beginning of select and poll, but it is still blocked in the data acquisition request.

Redis threading model

Why use I/O multiplexing in Redis

Why is I/O multiplexing used in Redis? Because Redis is running onsingle threadIn , all operations are performed sequentially and linearly, butBecause read and write operations are waiting for user input or output are blocked, so the I/O operation often cannot return directly under normal circumstances. When the I/O of a certain file is blocked, the entire process cannot provide services to other clients. And I/O multiplexing is to solve this problem.In order to allow a single-threaded (process) server application to process events from multiple clients at the same time, Redis uses an IO multiplexing mechanism.

Redis thread model implementation

Redis developed a network event processor based on the epoll implementation of the Reactor mode. This processor is called a file event processor. It consists of 4 parts: multiple sockets, IO multiplexing program, file event dispatcher, and event processor. Because the consumption of the file event dispatcher queue is single-threaded, Redis is called a single-threaded model.

insert image description here

message processing flow

Multiple sockets may generate different operations concurrently, and each operation corresponds to a different file event, but the IO multiplexing program will listen to multiple sockets, and will put the sequence of sockets that generate events into the queue one at a time. And after the execution of one event, another event is released. The event dispatcher takes out a socket from the queue each time, and hands it to the corresponding event handler for processing according to the event type of the socket.

Although multiple file events may occur concurrently, the I/O multiplexer will always push all sockets that generate events into a queue, and then pass through this queue in order (sequentially), Synchronously, one socket at a time to send the socket to the file event dispatcher: when the event generated by the previous socket is processed (the socket is executed by the event handler associated with the event completed), the I/O multiplexer will continue to pass the next socket to the file event dispatcher.

Implementation of an I/O multiplexer

All the functions of Redis's I/O multiplexing program are realized by wrapping select, epoll, evport and kqueue these I/O multiplexing function libraries, and each I/O multiplexing function library is in Redis The source code corresponds to a separate file, such as ae_select.c, ae_epoll.c, ae_kqueue.c, etc. Because Redis implements the same API for each I/O multiplexer library, the underlying implementations of I/O multiplexers are interchangeable.

Types of file events

The I/O multiplexer can listen to the ae.h/AE_READABLE event and ae.h/AE_WRITABLE event of multiple sockets. The correspondence between these two types of events and socket operations is as follows:

  • When the socket becomes readable (the client performs a write operation on the socket, or performs a close operation), or when a new acceptable socket appears (the client listens to the server's listening socket execute the connect operation), the socket generates an AE_READABLE event.

  • When the socket becomes writable (the client performs a read operation on the socket), the socket generates an AE_WRITABLE event. The I/O multiplexing program allows the server to listen to the AE_READABLE event and the AE_WRITABLE event of the socket at the same time. If a socket generates both events at the same time, the file event dispatcher will give priority to the AE_READABLE event and wait until the AE_READABLE event is processed. After that, the AE_WRITABLE event is processed. That is to say,If a socket is both readable and writable, the server will read the socket first, then write to the socket.

Handlers for file events

Redis has written multiple processors for file events. These event processors are used to achieve different network communication requirements. The commonly used processors are as follows:

  • In order to respond to each client connecting to the server, the server must associate a connection response handler with the listening socket.
  • In order to receive command requests from the client, the server must associate a command request processor with the client socket.
  • In order to return the execution result of the command to the client, the server must associate the command reply handler with the client socket.

connection response handler

The connection response processor is used to respond to the client connecting to the listening socket of the server, and it is specifically implemented as a wrapper of the sys/socket.h/accept function.

When the Redis server is initialized, the program will associate the connection response processor with the AE_READABLE event of the server listening socket. When a client uses the sys/socket.h/connect function to connect to the server listening socket, The socket will generate an AE_READABLE event, triggering the execution of the connection response processor, and performing the corresponding socket response operation.

command request handler

The command request processor is responsible for reading the content of the command request sent by the client from the socket, and it is specifically implemented as a wrapper of the unistd.h/read function.

When a client successfully connects to the server through the connection response handler, the server will associate the AE_READABLE event of the client socket with the command request handler. When the client sends a command request to the server, the socket will The AE_READABLE event is generated, the command request handler is triggered, and the corresponding socket read operation is performed.

During the entire process of the client connecting to the server, the server will always associate the command request handler for the AE_READABLE event of the client socket.

command reply handler

The command reply processor is responsible for returning the command reply obtained after the server executes the command to the client through the socket, which is specifically realized as the packaging of the unistd.h/write function.

When the server has a command reply to be sent to the client, the server will associate the AE_WRITABLE event of the client socket with the command reply handler, and when the client is ready to receive the command reply sent back by the server, the AE_WRITABLE event will be generated , causing the command reply handler to execute, and perform the corresponding socket write operation.

When the command reply has been sent, the server disassociates the command reply handler from the client socket's AE_WRITABLE event.

An example of a complete client-server connection event

Assuming that the Redis server is running, the AE_READABLE event of the server's listening socket should be in the listening state, and the handler corresponding to the event is the connection response handler.

If a Redis client initiates a connection to the Redis server at this time, the listening socket will generate an AE_READABLE event, triggering the execution of the connection response processor: the processor will respond to the client's connection request, and then create a client socket, And the state of the client, and associate the AE_READABLE event of the client socket with the command request handler, so that the client can send a command request to the main server.

After that, the client sends a command request to the Redis server, then the client socket will generate an AE_READABLE event, triggering the execution of the command request processor, and the processor reads the command content of the client, and then passes it to the relevant program for execution.

Executing commands will generate corresponding command replies. In order to send these command replies back to the client, the server will associate the AE_WRITABLE event of the client socket with the command reply handler: when the client tries to read the command reply, the client The client socket will generate an AE_WRITABLE event, triggering the execution of the command reply processor. When the command reply processor writes all the command replies to the socket, the server will release the AE_WRITABLE event and the command reply processor of the client socket. connection between.

insert image description here

references

Five IO models
One article to understand I/O multiplexing and its technology
Thoroughly understand the threading model of Redis

Guess you like

Origin blog.csdn.net/SO_zxn/article/details/130999391