Detailed explanation of Linux IO model evolution

basic concept

IO plays an important role in the computer world, and IO efficiency has always been the most tireless goal of code farmers. In this article, we will study how the working mode of Linux IO has been stepped up to today. The IO we are talking about mainly refers to the type of IO used in the application process, including two types of IO: file IO and network IO. This article mainly studies network IO. The data interaction method between the application process and the kernel has been evolving. Below we will introduce the various forms of interaction. Before this, we first clarify a few concepts: kernel space and user space, synchronization and asynchronous, blocking and non-blocking.

Kernel space

The memory space owned by the operating system alone is the kernel space. This memory space is independent of other application memory spaces. Except for the operating system, other applications are not allowed to access this space. But the operating system can operate kernel space and user space at the same time.

User space

The memory space allocated to the user application process alone, the operating system and application programs can access this memory space.

Synchronize

After the calling thread issues a synchronization request, the call will not return until the result is not obtained. All synchronization calls must be serial, and the next synchronization call can only be processed after the previous synchronization call is processed.

asynchronous

After the calling thread issues an asynchronous request, the call returns before the result is obtained. The actual result data will be notified to the caller in the form of a signal or callback after the business processing is completed.

block

After the calling thread sends a request, the thread will be suspended before the result is obtained. At this time, the CPU will not allocate time to this thread, and the thread is in a non-executable state. Until the return result is returned, this thread will be awakened and continue to run. Key points: The thread does not occupy CPU resources when it enters the blocking state.

Non-blocking

After the calling thread sends the request, the call returns before the result is obtained, and the entire procedure calling thread will not be suspended.

Synchronous blocking IO

The synchronous blocking IO mode is the most commonly used IO model in Linux. All Socket communications use the synchronous blocking IO model by default. In the synchronous blocking IO model, after the application thread calls the kernel's IO interface, it will be blocked until the kernel prepares the data and copies it to the user space memory of the application thread. Common Java BIO and Socket network communication in blocking mode use this mode for network data transmission.

We can use the man readcommand to view the read IO function read provided by the Linux kernel

ssize_t read(int fd, void *buf, size_t count);

When an application thread initiates an IO request for a read operation, the kernel enters the waiting data phase after receiving the request, and the application thread will be in a blocked state at this time. When the kernel prepares the data, the kernel copies the data from the kernel space to the user memory space, and then the kernel returns the result to the application thread. At this time, the application thread unblocks the state.

The synchronous blocking mode is simple and straightforward, without the consumption of thread switching, callbacks, notifications, etc. in the following modes. It is the best choice in network communication scenarios with less concurrency.

But in a large-scale network communication scenario, a large number of requests and connections need to be processed, and it is unacceptable for threads to be blocked.

  • Advantages: network communication scenarios with less concurrency are more efficient and application development is simple.
  • Disadvantages: Not suitable for network communication scenarios with a large amount of data.

Synchronous non-blocking IO

Synchronous non-blocking IO is a variant of synchronous blocking IO. The difference between it and synchronous blocking is that after the application thread sends an IO request to the kernel, the kernel's IO data will immediately return an error to the application thread when it is not ready. Code (EAGAIN or EWOULDBLOCK), after the kernel's IO data is ready, when the application thread initiates an IO operation request, the kernel will return a normal response to the application thread after copying the IO data from the kernel space to the user space. Common Socket network communication in Non-Blocking mode is synchronous non-blocking mode.

When a user thread initiates a read operation, if the kernel's IO data is not ready, it will not block the user thread, but will directly return an EAGAIN/EWOULDBLOCK error. From the perspective of the user thread, it gets a result immediately after it initiates a read operation. The user process judges that the result is EAGAIN/EWOULDBLOCK and then initiates the read operation again. This continuous call with the return value is called polling. Obviously, doing so will consume a lot of CPU time. Once the IO data in the kernel is ready, and it receives a request from the user process again, it immediately copies the data to the user memory and then returns.

  • Advantages: The application thread will not be blocked during the kernel IO data preparation stage, which is suitable for network applications that are sensitive to thread blocking.
  • Disadvantages: polling the kernel IO data status, consuming a lot of CPU, low efficiency.

Multiplexing

Multiplexing is currently the most common IO model in large-scale Internet applications. Simply put, there is an IO state manager in the application process. Multiple network IOs are registered to this manager. The manager uses a thread to call the kernel API. To monitor the status changes of all registered network IOs, once a connected network IO status changes, the application can be notified to perform corresponding read and write operations. The multiplexed network IO multiplexes this state manager, so it is called multiplexed mode. Multiplexing is essentially synchronous blocking, but compared with the traditional synchronous blocking multithreaded model, the biggest advantage of IO multiplexing is that only one thread can complete a large number of network IO states when dealing with high IO concurrency scenarios. The management work, the system resource overhead is small. Java's NIO and Nginx both use multiplexing mode for network transmission. The basic workflow of multiplexing:

  1. The application registers the network IO to the state manager;
  2. The state manager confirms the state of the managed network IO by calling the kernel API;
  3. After the state manager detects that the state of the network IO has changed, it notifies the application to perform substantial synchronous blocking read and write operations.

There are currently three main state managers in Linux: select, poll, and epoll. Epoll is currently the preferred model for the development of large-scale network concurrent programs in Linux, and its performance far exceeds select and poll in most cases. The current popular high-performance web server Nginx officially relies on the efficient network socket polling service provided by epoll. However, in the case of low concurrent connections, multithreading + blocking I/O may perform better. They are also products of different historical periods:

  • Select appeared in BSD in 1984;
  • Poll was implemented in 1997 after 14 years. In fact, it’s not a problem of efficiency if it dragged so long. The hardware of that era was too weak. A server to handle more than 1,000 links is a god-like existence, with a long choice. The demand has been met for some time;
  • In 2002, Davide Libenzi realized epoll;

These three state managers monitor the state of the network connection through different kernel APIs. Different APIs provide different capabilities, leading to differences in performance. Let's analyze them one by one.

select

Select is the oldest multiplexing model. Linux only provided select mode before version 2.6, which was once the mainstream network IO mode. Select uses periodic polling to send file handles corresponding to all network IOs it manages to the kernel for status query. The following is the API provided by the kernel system to the application:man select

int select(int maxfdp1,fd_set *readset,fd_set *writeset,fd_set *exceptset,const struct timeval *timeout);

fd_set is a Long array data structure, which is used to store file descriptors. This API has three key parameters, namely readset/writeset/exceptset, the first two parameters are registered to select, all the network IO file handle arrays that need to be monitored, and the third parameter is an empty array. The kernel polls all network IO files. After the handle is handled, the file handle value whose state has changed is written to the exceptset array, that is to say, readset/writeset is input data, and exceptset is output data. Finally, the kernel returns the changed number of handles to the caller.

Select workflow:

  1. The application thread registers the network IO file handle to be monitored to the select status monitor;
  2. The select status monitor worker thread periodically calls the kernel API and passes all the file handles managed by it to the kernel through two parameters (readset/writeset);
  3. The kernel polls the network IO status of all incoming file handles, writes the changed file handle value into the exceptset array, and returns the number of changed handles to the caller;
  4. The select worker thread informs the application to perform substantial synchronous blocking read and write operations.

Characteristic analysis of select mechanism:

  1. Every time select is called, the readset/writeset collection needs to be copied from the user space state to the kernel space. If the readset/writeset collection is large, the overhead is very high;
  2. Every time you call select, you need to traverse all the file handles passed in by the kernel, and each call is linearly traversed. The time complexity is O(n). When the file handle collection is large, the overhead is also high;
  3. The kernel limits the size of the file handle collection to be monitored, 1024 for X86 and 2048 for X64.

poll

The poll model is very similar to the select model. The status monitor also manages a batch of network IO status. The kernel also linearly polls all network IO file handles transferred to confirm the status. The only difference is the file handle array transmitted by the application thread to the kernel. The size is not limited, and the third problem mentioned in select is solved. The other two problems still exist.

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

typedef struct pollfd {
    
    
        int fd;                         // 需要被检测或选择的文件描述符
        short events;                   // 对文件描述符fd上感兴趣的事件
        short revents;                  // 文件描述符fd上当前实际发生的事件
} pollfd_t;

This is the poll API provided by the kernel, fds is an array of struct pollfd type, used to store the network IO file handles whose status needs to be checked, and the fds array will not be cleared after calling the poll function; the pollfd structure represents a monitored The file descriptor. Among them, the events field of the structure is the event mask for monitoring the file descriptor, which is set by the user, and the revens field of the structure is the event mask of the operation result of the file handle. The kernel sets this field when the call returns.

epoll

epoll was formally proposed in the Linux2.6 kernel. It is an event-driven I/O method. Compared with select, epoll has no limit on the number of file handles, and stores the events of network IO file handles that the application cares about in an event in the kernel. In the table, the copy in user space and kernel space only needs one time. The epoll kernel and network devices have established a subscription callback mechanism. Once the network connection status registered in the kernel event table changes, the kernel will receive notifications from the network device. The subscription callback mechanism replaces the select/poll polling query mechanism. The time complexity is reduced from the original O(n) to O(1), which greatly improves the IO efficiency, especially in scenarios where there are only a few active connections in a large number of concurrent connections.

Three epoll APIs provided by Linux:

int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

Epoll has turned a large API of select/poll into three APIs. The purpose is to first register the handle data that needs to be monitored in the kernel through the epoll_ctl interface, instead of passing a lot of data each time the network IO status is queried. Epoll builds a red-black tree in the kernel memory to store the connection from epoll_ctl. The epoll kernel also builds a doubly linked list of rdllist to store the file handle of the network status change. When epoll_wait is called, only observe this rdllist There is no data in the doubly linked list. If there is data, it will return. If there is no data, let epoll_wait sleep. After the timeout is up, it will return even if the list has no data. Because epoll does not poll each connection to confirm the status like select/poll, but monitors a doubly linked list. In the case of a large number of connections, epoll_wait is very efficient.

All network connections added to epoll will establish a callback relationship with the device (such as a network card) driver , which means that the network device will call a callback method to notify the kernel when the corresponding connection status occurs. This callback method is called ep_poll_callback in the kernel, and it will Put the change event of the network status change in the above rdllist doubly linked list.

When calling epoll_wait to check whether there is a connection to a state change event, it only checks whether the rdllist doubly linked list in the eventpoll object has data. If the rdllist linked list is not empty, the events in the linked list are copied to user mode memory (using MMAP to improve efficiency ), the number of events will be returned to the user at the same time. When epoll_ctl adds, modifies, and deletes the monitored network IO file handle to the epoll object, it is also very fast to find events from the rbr red-black tree, which means that epoll is very efficient, and it can easily handle millions of levels of concurrency connection.

The epoll_wait call of epoll has two trigger return modes, EPOLLLT and EPOLLET. LT is the default mode, which is more secure, and ET is the "high-speed" mode, which is more efficient:

  • **Horizontal trigger (LT): **The default working mode, that is, when epoll_wait detects a network connection status change and notifies the application, the application may not immediately process the event; the next time epoll_wait is called, it will be notified again event;
  • Edge trigger (ET): When epoll_wait detects a network connection status change and notifies the application, the application must immediately process the event. If it is not processed, the event will not be notified again when epoll_wait is called next time.

Characteristic analysis of epoll mechanism:

  1. There is no limit on the maximum concurrent connection, and the upper limit of the FD that can be opened is much greater than 1024 (more than 100,000 connections can be monitored on 1G of memory);
  2. Register the network connection to the kernel through the epoll_ctl method, instead of transmitting all network file handles to the kernel every time the connection status is queried, greatly improving efficiency;
  3. The kernel and network devices establish an event subscription mechanism to monitor the connection network status without polling, and the efficiency will not decrease as the number of file handles increases. Only active and available file handles will trigger the callback function; the biggest advantage of Epoll is that It only cares about your "active" connections, and has nothing to do with the total number of connections;
  4. Use MMAP memory mapping technology to speed up the message transfer between user space and kernel space and reduce copy overhead.

At present, most mainstream applications are built based on this IO model, such as Nginx, NodeJS, Netty framework, etc., to summarize the characteristics of multiplexing:

  • Advantages: Use one thread to monitor the status of multiple network connections, with good performance, especially the evolution of the final form epoll mode, which is suitable for a large number of connection business scenarios
  • Disadvantages: more complex, difficult to develop applications

Signal drive

The signal drive mode uses the linux signal mechanism to register sigio read and write signals and handler callback functions into the kernel queue through the sigaction function. After registration, the application process is not blocked, and you can do other work. When the network IO status changes, the SIGIO interrupt is triggered, and the application program is notified that the network IO is ready by calling the application program's handler. The first half of the signal-driven operation is asynchronous, and the subsequent network data operation is still a synchronous blocking behavior.

  • Advantages: This asynchronous callback method avoids the waste of resources caused by the user or the kernel actively polling the device
  • Disadvantages: The handler runs in an interrupt environment, multi-threading is unstable, and platform compatibility is not good, it is not a perfect and reliable solution, and there are few actual application scenarios

Asynchronous IO

Asynchronous IO is implemented through a series of asynchronous APIs. It is the only true asynchronous mode among the five IO modes. Currently, Java's AIO uses this mode. The read operation in asynchronous mode is realized by calling the aio_read function of the kernel. The application thread calls aio_read and submits it to the kernel in a buffer in user space. The kernel returns immediately after receiving the request without blocking the application thread. When the data of the network device arrives, the kernel will automatically copy the data from the kernel space to the user mode cache submitted by the aio_read function. After the copy is completed, the user thread is notified by a signal, and the user thread can perform subsequent operations after receiving the data.

The difference between the asynchronous IO mode and the signal-driven IO is that the signal-driven IO is notified by the kernel when the application can start the IO operation, and the asynchronous IO is told by the kernel when the IO operation is completed. Asynchronous IO actively copies data to user space, without calling the recvfrom method to pull data from kernel space to user space. Asynchronous IO is a mechanism to push data, which is more efficient than the mechanism of signal processing IO to pull data.

Asynchronous IO is still a relatively new IO mode and requires operating system support. Linux 2.5 version provides asynchronous IO for the first time, Linux 2.6 and later versions, and asynchronous IO APIs are standard. There are not many application scenarios for asynchronous IO.

  • Advantages: pure asynchronous, high efficiency and high performance.
  • Disadvantages: The efficiency is not qualitatively improved compared to the multiplexing mode, the mature application migration mode has insufficient power, and there has been no large-scale mature application to support it.

to sum up

Each of the five Linux IO modes has its own characteristics, and its existence is reasonable, and each has its own application scenarios. At present, when writing some simple low-concurrency Socket communication, most people still use multithreading and synchronous blocking. The efficiency is similar to other modes, and the implementation will be much simpler.

At present, the popular high-concurrency network communication frameworks on the market, Nginx, Java-based NIO's Netty framework and NodeJS all use multiplexing models. After a large number of actual projects, multiplexing is currently the most mature high-concurrency Network communication IO model. Epoll in the multiplexing model is the best. At present, systems above Linux 2.6 provide standard epoll APIs. Java NIO will provide epoll implementation by default in Linux 2.6 and above. The implementation of poll will be provided on the version. Windows currently does not support epoll, only select, but nothing, basically no one uses Windows as a network application server.

The signal-driven IO feels immature, and I have basically never seen a usage scenario. Pure asynchronous mode, the kernel does everything, it looks very good, Java also provides a responsive implementation, but because the efficiency is not qualitatively improved compared to the multiplexing mode, the mature application migration mode has insufficient motivation, and there has been no large-scale Mature applications to support.

reference:

"In-depth analysis of the working mechanism of Java I/O"

"Three Mechanisms of IO Multiplexing Select, Poll, Epoll"

Guess you like

Origin blog.csdn.net/u013277209/article/details/111997907