Five I/O models in Linux


1. Blocking I/O model

When the process reads data, it will always block and wait , waiting for the kernel to process the data through the system call, and will not return until the data packet arrives and is copied to the buffer or an error occurs .

Blocking waiting in I/O generally refers to the two processes of waiting for kernel data to be ready and copying data from kernel mode to user mode.

The blocking I/O model is characterized by low difficulty in implementation and relatively easy application development. However, since each I/O request will block the process, it is necessary to allocate a process or thread for each request to ensure a timely response. The system overhead is high, and it is not suitable for applications with a large amount of concurrency.


2. Non-blocking I/O model

When the process reads data, it can read non-blockingly through the MSG_DONTWAIT flag. At this time, if there is no data in the buffer, an EAGAIN or EWOULDBLOCK error will be returned directly, without making the process wait forever . But this also means that if the process wants to read data, it needs to continuously initiate read requests until the data is read.

The non-blocking I/O model is characterized by low difficulty in implementation and relatively easy application development. However, process polling consumes a lot of CPU resources and is not suitable for applications with a large amount of concurrency.


3. I/O multiplexing model

The idea of ​​the I/O multiplexing model is to register the I/O of multiple processes to a multiplexer (select, poll or epoll) , and the multiplexer monitors the data read.

Take select as an example, if the I/O monitored by select has no readable data in the kernel buffer, the select calling process will be blocked. But as long as any I/O has readable data in the kernel buffer, the select call will return, and then the select process will notify the I/O process to read the prepared data in the kernel.

It can be seen that after adopting the I/O multiplexing model, although there are multiple processes registering for I/O, only one select process will be blocked.

The implementation methods of I/O multiplexing in Linux mainly include select, poll and epoll.

1.select

Select listens to file descriptors based on polling, so it needs to repeatedly traverse all the file descriptors passed in by the kernel , which will cause a sharp drop in performance when a large number of I/Os are registered.

In addition, every time select is called, the file descriptor set needs to be copied from the user mode to the kernel mode , and this overhead will be very large when there are many file descriptors.

At the same time, the number of select monitors is also limited .

2.poll

Like select, poll also monitors file descriptors based on polling, except that the structure of the file descriptor set is modified, and there is no limit to the number of monitors.

However, the overhead problem caused by polling monitoring and overall replication is still not resolved.

3.epoll

epoll has been supported since Linux 2.6. The underlying data structure is a red-black tree, and the comprehensive efficiency of adding, deleting, and modifying is high. Moreover, the epoll model changes active polling to passive notification, so after epoll registers the socket, the main program can do other things. When an event occurs, it will process it after receiving the notification. At the same time, the message transmission between the kernel and the user space is realized by mmap, and the number of monitors is large and the efficiency is high.

Epoll can be understood as event poll, and epoll will notify other processes which I/O event has occurred in which stream, so epoll is event-driven .

The epoll trigger mode is divided into horizontal trigger and edge trigger:

  • Level trigger (LT) : The default trigger method, as long as the file descriptor still has data to read, epoll_waitits event will be returned every time to remind the data processing process to operate.
  • Edge trigger (ET) : High-speed trigger mode, only prompt once, no matter whether there is data readable in the file descriptor, it will not prompt again until the next data flow. Therefore, when reading in ET mode, it is necessary to take out all the data as much as possible at one time.

4. Signal-driven I/O model

When a process initiates an IO operation, it will sigactionregister a signal processing function with the kernel through a system call, and then the process returns without blocking. When there is data ready in the kernel, a signal will be sent to the reading process, and the reading process will call I/O to read the data in the signal processing callback function.

The signal-driven I/O model is implemented based on the callback mechanism, but application development is more difficult, and there are few practical applications.


5. Asynchronous I/O model

When a process initiates an I/O operation, it will notify the kernel to process the I/O, and then the process returns directly without blocking, but the result cannot be obtained immediately. After the kernel finishes processing the entire I/O, the process will be notified . If the I/O operation is successful, the process will directly obtain the data.

The main difference between the asynchronous I/O model and the signal-driven model is that the signal-driven I/O is the kernel notifying the application when to start an I/O operation, while the asynchronous I/O model is the kernel notifying the application I/O operation When complete , the process is not required for data reads.

Both blocking I/O and non-blocking I/O are synchronous calls. Because when the data is read, the process of the kernel copying the data from the kernel space to the user space needs to wait, that is to say, this process is synchronous. If the copying efficiency of the kernel is not high, it will wait for comparison during this synchronization process. long time. The real asynchronous I/O is that the kernel data is ready and the data is copied from the kernel state to the user state without waiting.

Asynchronous I/O can take full advantage of the DMA feature, allowing I/O operations to overlap with calculations. But to achieve true asynchronous I/O requires the operating system to do a lot of work. At present, real asynchronous I/O is realized through IOCP under Windows. The Linux system only introduced AIO in Linux2.6 and it is not perfect. Therefore, the I/O multiplexing model is still the main way to realize high-concurrency network programming under Linux .

Based on synchronous and asynchronous, there are two network modes: Reactor and Proactor:

  • Reactor is a non-blocking synchronous network mode, which perceives readable and writable events . After each readable readiness event is sensed, the application process needs to actively complete the data reading, that is, the application process must actively read the data received by the socket into the application process memory. This process is synchronous. After reading The application process can process the data after receiving the data.
  • Proactor is an asynchronous network mode that perceives completed read and write events . When initiating an asynchronous read and write request, information such as the address of the data buffer needs to be passed in, so that the system kernel can automatically complete the data read and write work for us. The whole process of reading and writing here is done by the operating system. It is not necessary for the application process to actively read and write data like Reactor. After the operating system completes the reading and writing work, it will notify the application process to process the data directly.

Guess you like

Origin blog.csdn.net/qq_43686863/article/details/129916544