[Linux system programming] - deep understanding of 5 IO models

Table of contents

1. Basic knowledge of IO

1. Direct IO and non-direct IO

2. What is DMA technology?

2. Five models of IO

1. Blocking I/O

2. Non-blocking I/O

3. IO multiplexing

4. Signal driven IO

5. Asynchronous I/O


1. Basic knowledge of IO

1. Direct IO and non-direct IO

Disk IO is very slow, so in order to reduce disk IO, Linux will use DMA technology to copy the data in the disk to the kernel buffer first after the system call . This cache is the page cache. Only when the cache meets certain conditions Only when the disk IO is initiated.

According to whether the kernel cache is used to distinguish whether it is direct IO or non- direct IO

  • Direct IO does not cause kernel cache and user-level data copying , but directly accesses the disk through the file system.
  • In non-direct IO, data is copied from the kernel to the application process during a read operation, and copied from the application process to the kernel space during a write operation, and the kernel space then decides when to write to the disk.

 Similarly, socket communication also needs to establish a buffer in the kernel to allow communication between the processes of the two hosts.

2. What is DMA technology?

DMA transfers copy data from one address space to another without CPU intervention , providing cached data transfers between peripherals and memory , or between memory and memory .

DMA technology is designed based on the above assumptions. Its role is to solve the problem of excessive consumption of cpu resources for large amounts of data transfer . With DMA technology, the cpu can do more complex tasks .

2. Five models of IO

1. Blocking I/O

  • When the user program calls read, the thread will be blocked , it needs to wait for the underlying DMA technology to copy the data to the kernel ,
  • The thread at this time will do nothing until the data is ready and copy the data from the kernel buffer to the user buffer.
  • When the copy is complete, read will return.

2. Non-blocking I/O

  • If the data in the kernel is not ready, it will return immediately with the EWOULDBLOCK error code. Although read returns immediately, the underlying DMA technology will always copy the data to the kernel.
  • If the data in the kernel is ready, the kernel copies the data to the application process.
  • Non-blocking IO generally needs to be set to cyclic polling , that is, to constantly check whether the data is ready in the kernel. During this process, the application process cannot do anything, which is a big waste of CPU .

3. IO multiplexing

What is IO multiplexing?

IO multiplexing can wait for multiple file descriptors at the same time , and can make multiple file descriptors use DMA technology to copy data to the kernel at the same time . If an event occurs in a certain file descriptor (read event or write event), the corresponding file descriptor will be saved and finally returned to the application process.

Why is there IO multiplexing?

In the examples of blocking IO and non-blocking IO, they just let a thread wait for a file descriptor , but an application process is likely to perform multiple IO data interactions , for example, a thread may communicate with disk There may be multiple files for data IO interaction, and there may be multiple sockets for network communication, so a thread may be constantly performing IO data interaction and data processing. In order to improve the efficiency of a process’s event processing, we need to improve IO efficiency and data processing efficiency, today, we first consider how to improve the IO efficiency of the process. The essence of IO multiplexing is to improve the efficiency of process IO , which mainly reduces the time of IO blocking and the number of system calls. as follows:

Assuming that a process needs to exchange IO data with file descriptors fd 1~32 , simulate the process of blocking IO and non-blocking IO , and IO multiplexing for IO data exchange.

Blocking IO to wait for multiple file descriptions:

  • It can only wait for one file descriptor at a time . When calling read to read the data of fd=1, the thread will be blocked. DMA technology will copy the data from the disk to the kernel until the kernel prepares the data.
  • Copy the data to the application process and return to the application process.
  • The application process finishes processing the data, and then calls the read area to read the data of fd=2 , and so on.
  • Blocking IO will block for a period of time every time read is called, which is obviously very inefficient in IO.

Non-blocking IO waits for multiple file descriptors:

  • When the data of a file descriptor is not ready, it will return immediately, and then scan the next file descriptor.
  • When the file descriptor has data, the kernel will copy the data to the application process, and the application process will process the data.
  • This is obviously more efficient than blocking IO, because it will not be blocked in the kernel.
  • However, non-blocking IO will frequently call read, even if the data is not ready, this will consume a certain amount of CPU.

IO multiplexing waits on multiple file descriptors:

  • The function of IO multiplexing is to let the kernel scan all the file descriptors fd , and when the file descriptor is scanned, the DMA technology will be called to copy the data to the kernel . If there is data in a file buffer, it will be copied The file descriptor is stored, and if there is no data, just skip to the next file descriptor.
  • After scanning once, return all file descriptors with data ready to the application process , and the application process will know which file descriptors have data ready.
  • After the application process gets these fds, it can call read to enter the kernel in turn without being blocked, because the data entering the data buffer of the file descriptor must be ready, so there is no need to wait .
  • IO multiplexing can allow multiple file descriptors to copy data to the kernel using DMA technology at the same time, reducing the blocking time of the application process.

  •  Blocking IO needs to block for a period of time every time it reads data . During this time, it cannot wait for other file descriptors, so the IO efficiency is relatively low.
  • Non-blocking IO to read data, if there is no data, it will return and read data of other file descriptors, but it will frequently call the read interface , even if the corresponding file buffer has no data , this is also There is a certain consumption of cpu .
  • Multiplexing can wait for multiple file descriptors at the same time. The scanned file descriptors will perform DMA technology to copy the data . If there is data in the buffer corresponding to a certain file descriptor , it will be saved. , if there is no data, skip to look at the next file descriptor, scan to the end, return all data-ready file descriptors to the application process, when the application process calls read to go to the file buffer, so IO multiplexes it It can reduce the blocking time and reduce the read system calls to improve the IO efficiency of the process .

4. Signal driven IO

  • The application process sets a signal and returns immediately, and the bottom layer uses DMA technology to copy the data to the kernel buffer .
  • When the kernel data is ready, the kernel will use the signal to notify the application process , so that the application process can actively initiate read to copy the kernel data to the application process.

5. Asynchronous I/O

Signal-driven IO, blocking IO, non-blocking IO, multi-channel transfer , they all belong to synchronous IO , the synchronization here belongs to the synchronization of the data copy process , they will actively call read to copy the data from the kernel to the application process, in During the process of the kernel copying data to the application process, the application process is blocked, and they will not do anything .

The real asynchronous IO is that the kernel prepares data and copies the data to the application process. The application process will not wait. When the data is ready, the kernel will actively copy the data to the application process. After the copy is completed, the kernel will notify the application process. .


 

Guess you like

Origin blog.csdn.net/sjp11/article/details/126234498