Redis principle - IO detailed explanation

The address of the original text is updated, and the reading effect is better!

Redis principle - IO detailed explanation | CoderMast programming mast icon-default.png?t=N5K3https://www.codermast.com/database/redis/redis-IO.html

User Space and Kernel Space

The distribution version of any Linux system, its system kernel is Linux. Our applications all need to interact with the hardware through the Linux kernel.

 

In order to prevent user applications from causing conflicts or even kernel panics, user applications are separated from the kernel:

  • The memory addressing space is divided into two parts: kernel space and user space

For a 32-bit operating system, the addressing address is 0 ~ 2322 ^ {32}232

  • Only restricted instructions (Ring3) can be executed in user space, and system resources cannot be directly called, and must be accessed through the interface provided by the kernel
  • Kernel space can execute privileged commands (Ring0) and call all system resources

When a process runs in user space, it is called user mode, and when it runs in kernel space, it is called kernel mode.

In order to improve IO efficiency, the Linux system will add buffers in both user space and kernel space:

  • When writing data, the user buffer data should be copied to the kernel buffer, and then written to the device
  • Reading data is to read data from the device to the kernel buffer, and then copy it to the user buffer

 

5 IO models

  1. Blocking IO (Blocking IO)
  2. Non-blocking IO (Nonblocking IO)
  3. IO multiplexing (IO Multiplexing)
  4. Signal Driven IO (Signal Driven IO)
  5. Asynchronous IO (Asynchronous IO)

#blocking IO

As the name implies, blocking IO means that it must block and wait during the two stages of waiting for data and copying data to user space .

 

  1. User threads issue IO requests
  2. The kernel will check whether the data is ready, if not, it will wait forever, and the user thread will be in a blocked state, and the user thread will be in a blocked state
  3. When the data is ready, the kernel will copy the data to the user thread and return the result to the user thread, and the user thread will unblock the state

It can be seen that in the blocking IO model, the user process is blocked in two stages.

# non-blocking IO

The recvfrom operation of non-blocking IO returns the result immediately instead of blocking the user process.

 

  1. Waiting for the data stage, if the data is not ready, return EWOULDBLOCK immediately. In this process, the user process is non-blocking, but the user process will always initiate requests and be busy with rotation training until the kernel processing starts to stop the rotation training.
  2. After the data is ready, the data is copied from the kernel to the user space. The user process is blocked during this phase.

It can be seen that in the non-blocking IO model, the user process is non-blocking in the first phase and blocked in the second phase. Although it is non-blocking, the performance has not been improved, and the busy waiting mechanism will cause the CPU to idle, and the CPU usage will increase sharply.

# IO multiplexing

Whether it is blocking IO or non-blocking IO, the user application needs to call recvfrom to obtain data in the first stage. The difference lies in the processing method when there is no data:

  • If there is no data when recvfrom is called, blocking IO will cause the process to block, and non-blocking IO will cause the CPU to idle, which cannot play the role of the CPU.
  • If there is data when recvfrom is called, the user process can directly enter the second stage to read and process the data

For example, when the server processes client Socket requests, in the case of a single thread, it can only process each Socket in turn. If the socket being processed is not ready (data cannot be read or written), the thread will be blocked, and all other client sockets must wait, and the performance is naturally poor.

File Descriptor (File Descriptor): referred to as FD, is an unsigned integer incrementing from 0, used to associate a file in Linux. Everything in Linux is a file, such as regular files, videos, hardware devices, etc., of course, including network sockets (Socket)

IO multiplexing: It uses a single thread to monitor multiple FDs at the same time, and is notified when a certain FD is readable or writable, so as to avoid invalid waiting and make full use of CPU resources.

 

There are three ways to implement IO multiplexing technology:

  • select
  • poll
  • epoll

difference:

  • select and poll will only notify the user process that the FD is ready, but it is not sure which FD it is, and the user process needs to traverse the FD one by one to confirm
  • epoll will notify the user process that the FD is ready, and at the same time write the ready FD into the user space, and directly locate the ready FD

#SELECT

select is the earliest implementation of I/O multiplexing in Linux:

// 定义类型别名 __fd_mask,本质是 long int
typedef long int __fd_mask;

/* fd_set 记录要监听的fd集合,及其对应状态 */
typedef struct {
    // fds_bits是long类型数组,长度为 1024/32 = 32
    // 共1024个bit位,每个bit位代表一个fd,0代表未就绪,1代表就绪
    __fd_mask fds_bits[__FD_SETSIZE / __NFDBITS];
    // ...
} fd_set;


// select函数,用于监听多个fd的集合
int select(
    int nfds,// 要监视的fd_set的最大fd + 1
    fd_set *readfds,// 要监听读事件的fd集合
    fd_set *writefds,// 要监听写事件的fd集合
    fd_set *exceptfds,  // 要监听异常事件的fd集合
    // 超时时间,nulT-永不超时;0-不阻塞等待;大于0-固定等待时间
    struct timeval *timeout
);

 

The specific process is as follows:

  1. Create fd_set rfds in user space
  2. If you want to monitor fd = 1, 2, 5
  3. Execute selec(5 + 1, rfds, null, null, 3) in user space
  4. Copy the fd_set rfds array created in user space to kernel space
  5. Traversing the copied fd_set rfds array in the kernel space
  6. If not ready, set the fd at that location to 0.

Problems with select mode:

  • It is necessary to copy the entire fd_set from user space to kernel space, and copy it back to user space again after selecting
  • Select cannot know which fd is ready, it needs to traverse fd_set
  • The number of fds monitored by fd_set cannot exceed 1024,

#POLL

The poll mode has made a simple improvement to the select mode, but the performance improvement is not obvious. Some key codes are as follows:

// pollfd 中的事件类型
#define POLLIN      //可读事件
#define POLLOUT     //可写事件
#define POLLERR     //错误事件
#define POLLNVAL    //fd未打开

// pollfd结构
struct pollfd{
    int fd;             // 要监听的 fd
    *short int events;  // 要监听的事件类型:读、写、异常
    short int revents;  // 实际发生的事件类型
}

// poll函数
int poll(
    struct pollfd xfds, // pollfd数组,可以自定义大小
    nfds_t nfds,        // 数组元素个数
    int timeout         // 超时时间
);

I/O process:

  1. Create a pollfd array, add the concerned fd information to it, and customize the array size
  2. Call the poll function, copy the pollfd array to the kernel space, transfer to the linked list for storage, no upper limit
  3. The kernel traverses fd to determine whether it is ready
  4. After the data is ready or timeout, copy the pollfd array to the user space, and return the ready fd number n
  5. The user process judges whether n is greater than 0
  6. If it is greater than 0, traverse the pollfd array and find the ready fd

Compare with SELECT:

  • The fixed value of fd_set in select mode is 1024, while pollfd uses a linked list in the kernel, which is theoretically infinite
  • The more FDs you listen to, the longer each traversal will take, and the performance will degrade instead.

#EPOLL

The epoll mode is an improvement to the select and poll modes, providing three functions:

struct eventpoll{
    //...
    struct rb_root rbr; // 一颗红黑树,记录要监听的fd
    struct list_head rdlist;  // 一个链表,记录就绪的 FD
    //...
}

// 1.会在内核创建eventpolL结构体,返回对应的句柄epfd
int epoll create(int size);

// 2.将一个FD添加到epol的红黑树中,并设置ep_poli_calLback
// calTback触发时,就把对应的FD加入到rdlist这个就绪列表中
int epoll _ctl(
    int epfd,   // epoll实例的句柄
    int op,     // 要执行的操作,包括:ADD、MOD、DEL
    int fd,     // 要监听的 FD
    struct epoll_event *event // 要监听的事件类型: 读、写、异常等
);

// 3.检查rdlist列表是否为空,不为空则返回就绪的FD的数量
int epoll wait(
    int epfd,       // eventpoll 实例的句柄
    struct epoll_event *events, // 空event 数组,用于接收就绪的 FD
    int maxevents,  // events 数组的最大长度
    int timeout // 超时时间,-1永不超时;0不阻塞;大于0为阻塞时间
);

 

#event notification mechanism

When FD has data to read, we can call epoll_wait to be notified, but there are two modes of time notification:

  • Level Triggered: LT for short. When FD has data to read, it will repeat the notification several times until the data processing is completed. It is the default mode of epoll.
  • Edge Triggered: ET for short. When FD has data to read, it will only be notified once, regardless of whether the data has been processed or not

for example

  1. Assume that the FD corresponding to a client Socket has been registered in the epoll instance
  2. Client Socket sent 2kb of data
  3. The server calls epoll_wait and is notified that the FD is ready
  4. The server reads 1kb of data from FD
  5. Go back to step 3 (call epoll_wait again to form a loop)

in conclusion

  • ET mode avoids the shocking group phenomenon that may occur in LT mode
  • ET mode is best combined with non-blocking IO to read FD data, which is more complicated than LT

# WEB service process

Basic flowchart of web service based on epoll mode: 

#Summary _

There are three problems with the select mode:

  • The maximum number of FDs that can be monitored does not exceed 1024
  • Every time you select, you need to copy all the FDs to be monitored to the kernel space
  • Every time it is necessary to traverse all FDs to determine the ready state

Problems with poll mode:

  • poll uses the linked list to solve the problem of monitoring the upper limit of FD in select, but it still needs to traverse all FDs. If there are more monitors, the performance will drop

How to solve these problems in epoll mode:

  • Based on the red-black tree in the epoll instance to save the FDs to be monitored, there is no upper limit in theory, and the efficiency of adding, deleting, modifying and checking is very high, and the performance will not decrease significantly as the number of monitored FDs increases
  • Each FD only needs to execute epoll_ctl once to add to the red-black tree, and then each epoll_wait does not need to pass any parameters, and there is no need to repeatedly copy FD to the kernel space
  • The kernel will directly copy the ready FD to the specified location in the user space, and the user process can know who the ready FD is without traversing all the FDs

#Signal driven IO

Signal-driven IO is to establish a SIGIO signal association with the kernel and set a callback. When the kernel has an FD ready, it will send a SIGIO signal to notify the user. During this period, the user application can perform other services without blocking and waiting.

 

When there are a large number of IO operations, there are many signals, and the SIGIO processing function cannot handle it in time, which may cause the signal queue to overflow.

Moreover, the performance of frequent signal interaction between kernel space and user space is also low.

#Asynchronous IO

The entire process of asynchronous IO is non-blocking. After the user process calls the asynchronous API, it can do other things. The kernel waits for the data to be ready and copies it to the user space before submitting the signal to notify the user process.

 

In the asynchronous IO model, the user process is non-blocking in both phases.

Although the asynchronous IO model is very simple, under high concurrent access, a large number of requests will be processed in the kernel, which may easily lead to a kernel crash.

# synchronous and asynchronous

Whether the IO operation is synchronous or asynchronous depends on the copy process of data between kernel space and user space (IO operation of data reading and writing), that is, whether the second phase is synchronous or asynchronous:

Guess you like

Origin blog.csdn.net/qq_33685334/article/details/131324401