Deep analysis of IO multiplexing mechanism

Hello, I'm Weiki and welcome to ape java.

IO multiplexing technology, whether it is an interview or usual technology accumulation, is an important knowledge point, and many high-performance technical frameworks have it. So, what is IO multiplexing? What problem does IO multiplexing solve? Let's take a look at it together today.

Statement: This article is based on the linux system

Nowadays, the bottom layer of many systems or frameworks uses the Socket programming model to realize network communication. Therefore, before explaining IO multiplexing, we first lay out the knowledge about Socket so that everyone can better understand IO multiplexing. For knowledge about Socket, you can refer to my previous blog post: Ape java: Do you know how Socket is created? If you already have Scoket knowledge you can skip this step.

With the understanding of socket, let's look at the common IO model:

1. Common IO models

Common network IO models are divided into four types: synchronous blocking IO (Blocking IO, BIO), synchronous non-blocking IO (NIO), IO multiplexing, asynchronous non-blocking IO (Async IO, AIO), where AIO is asynchronous IO , others are synchronous IO.

1.1 Synchronous blocking IO-BIO

Synchronous blocking IO: During thread processing, if IO operations are involved, the current thread will be blocked until the IO processing is completed before the thread proceeds to process the subsequent process. As shown in the figure below, the server will allocate a new thread for each socket of the client. The business processing of each thread is divided into 2 steps. When the processing of step 1 is completed and the IO operation (such as: loading files) is encountered, at this time, the current thread Will be blocked until the IO operation is completed, the thread will continue to process step 2.
img.png
Actual usage scenario: Using the thread pool in Java to connect to the database uses the synchronous blocking IO model.

Disadvantages of the model: Because each client requires a new thread, it will inevitably lead to frequent blocking and switching of threads, which will bring overhead.

1.2 Synchronous non-blocking IO-NIO (New IO)

Synchronous non-blocking IO: During thread processing, if IO operations are involved, the current thread will not be blocked, but will process other business codes, and then wait a while before checking whether the IO interaction is completed. As shown in the figure below: Buffer is a buffer used to cache read and written data; Channel is a channel responsible for connecting IO data in the background; and the main function implemented by Selector is to actively query which channels are in the ready state. Selector reuses a thread to query ready channels, which greatly reduces the overhead of frequent thread switching caused by IO interactions.
img_1.png

Actual usage scenarios: Java NIO is based on this IO interaction model to support business codes to implement synchronous and non-blocking designs for IO, thereby reducing the overhead of frequent thread blocking and switching bands in the original traditional synchronous blocking IO interaction process . The classic case used by NIO is the Netty framework, and the underlying mechanism of Elasticsearch is actually adopted.

1.3 IO multiplexing

will be explained in detail below

1.4 Asynchronous non-blocking IO-AIO

AIO is the abbreviation of Asynchronous IO, that is, Asynchronized IO. For AIO, instead of notifying the thread when the IO is ready, it notifies the thread after the IO operation has been completed. Therefore, AIO is completely non-blocking. At this point, our business logic will become a callback function, which will be automatically triggered by the system after the IO operation is completed. AIO was used in netty5, but it took a lot of effort, and the performance of netty5 failed to make a big leap over netty4, so netty5 was finally offline.

Next is the appearance of our protagonist IO multiplexing

2. What is IO multiplexing?

Presumably when we are learning a new technology or concept, the biggest question is the concept itself, and IO multiplexing is no exception. If you want to figure out what IO multiplexing is, you can start with IO multiplexing The "road" to start.

Road: The original meaning is the road, such as: the asphalt road in the city, the dirt road in the country, you must be familiar with these.
So: what is the road in IO?

Don't worry, let's first look at what IO is?

In a computer, IO is input and output (Input/Output), and direct information interaction is realized through the underlying IO device. For different operation objects, it can be divided into disk I/O, network I/O, memory-mapped I/O, etc. As long as an interactive system with input and output types can be considered an I/O system.
Finally, let's look at "road" and "multi-way" together

In socket programming, [ClientIp, ClientPort, ServerIp, ServerPort, Protocol] 5 elements can uniquely identify a socket connection. Based on this premise, a certain port of the same service can establish socket connections with n clients, as shown in the following figure To roughly describe:
img_2.png

Therefore, the socket connection between each client and server can be regarded as "one way", and the socket connection between multiple clients and the server is "multiple ways". Therefore, IO multiway is the input and output stream on multiple socket connections , Multiplexing means that the input and output streams on multiple socket connections are processed by one thread. So IO multiplexing can be defined as follows:

IO multiplexing in Linux refers to: one thread processes multiple IO streams.

3. What are the implementation mechanisms of IO multiplexing

First look at the model of the basic socket, in order to compare with the IO multiplexing mechanism below, the pseudo code is implemented as follows

listenSocket = socket(); //系统调用socket()函数,调用创建一个主动socket
bind(listenSocket);  //给主动socket绑定地址和端口
listen(listenSocket); //将默认的主动socket转换为服务器使用的被动socket(也叫监听socket)
while (true) { //循环监听客户端连接请求
   connSocket = accept(listenSocket); //接受客户端连接,获取已连接socket
   recv(connsocket); //从客户端读取数据,只能同时处理一个客户端
   send(connsocket); //给客户端返回数据,只能同时处理一个客户端
}

Realize the network communication process as shown in the figure below
img_3.png

The basic socket model can realize the communication between the server and the client, but each time the program calls the accept function, it can only handle one client connection. When there are a large number of client connections, the processing performance of this model is relatively poor. Therefore, Linux provides a high-performance IO multiplexing mechanism to solve this dilemma.

In Linux, the operating system provides three IO multiplexing mechanisms: select, poll, and epoll. We mainly analyze the principles of the three multiplexing mechanisms based on the following four aspects:

How many sockets can IO multiplexing monitor?
Which events in the socket can be monitored by IO multiplexing?
How does IO multiplexing sense ready file descriptor fd?
How does IO multiplexing implement network communication?

3.1 select mechanism

An important function in the select mechanism is select(). The function has 4 input parameters and returns an integer. The prototype and parameters of select() are as follows:

/**
*  参数说明
*  监听的文件描述符数量__nfds、
*  被监听描述符的三个集合*__readfds,*__writefds和*__exceptfds
*  监听时阻塞等待的超时时长*__timeout
*  返回值:返回一个socket对应的文件描述符
   */
   int select(int __nfds, fd_set * __readfds, fd_set * __writefds, fd_set * __exceptfds, struct timeval * __timeout)
   

How many sockets can select listen to?

Answer: 1024

Which events of the socket can select listen to?

Answer: The select() function has three fd_set sets, which represent three types of events to monitor, namely read data events (__readfds set), write data events (__writefds set) and abnormal events (__exceptfds set). When the set is NULL, it represents There is no need to handle the corresponding event.

How does select perceive the ready fd?

Answer: The fd collection needs to be traversed to find a ready descriptor.

How does the select mechanism realize network communication?

Code

int sock_fd,conn_fd; //监听socket和已连接socket的变量
sock_fd = socket() //创建socket
bind(sock_fd)   //绑定socket
listen(sock_fd) //在socket上进行监听,将socket转为监听socket

fd_set rset;  //被监听的描述符集合,关注描述符上的读事件
int max_fd = sock_fd

//初始化rset数组,使用FD_ZERO宏设置每个元素为0
FD_ZERO(&rset);
//使用FD_SET宏设置rset数组中位置为sock_fd的文件描述符为1,表示需要监听该文件描述符
FD_SET(sock_fd,&rset);

//设置超时时间
struct timeval timeout;
timeout.tv_sec = 3;
timeout.tv_usec = 0;
while(1) {
//调用select函数,检测rset数组保存的文件描述符是否已有读事件就绪,返回就绪的文件描述符个数
n = select(max_fd+1, &rset, NULL, NULL, &timeout);

//调用FD_ISSET宏,在rset数组中检测sock_fd对应的文件描述符是否就绪
if (FD_ISSET(sock_fd, &rset)) {
//如果sock_fd已经就绪,表明已有客户端连接;调用accept函数建立连接
conn_fd = accept();
//设置rset数组中位置为conn_fd的文件描述符为1,表示需要监听该文件描述符
FD_SET(conn_fd, &rset);
}

//依次检查已连接套接字的文件描述符
for (i = 0; i < maxfd; i++) {
     //调用FD_ISSET宏,在rset数组中检测文件描述符是否就绪
    if (FD_ISSET(i, &rset)) {
    //有数据可读,进行读数据处理
   }
  }
}

Select realizes the network communication process as shown in the figure below:
img_4.png

The shortcomings of the select function

First of all, the select() function has a limit on the number of file descriptors that a single process can monitor. The number of file descriptors it can monitor is determined by __FD_SETSIZE, and the default value is 1024.

Second, after the select function returns, it is necessary to traverse the descriptor collection to find ready descriptors. This traversal process will generate a certain overhead, thereby reducing the performance of the program.

3.2 Poll mechanism

The main function of the poll mechanism is the poll() function, and the prototype definition of the poll() function

/**
* 参数 *__fds 是 pollfd 结构体数组,pollfd 结构体里包含了要监听的描述符,以及该描述符上要监听的事件类型
* 参数 __nfds 表示的是 *__fds 数组的元素个数
*  __timeout 表示 poll 函数阻塞的超时时间
   */
   int poll (struct pollfd *__fds, nfds_t __nfds, int __timeout);
   pollfd结构体的定义

   struct pollfd {
      int fd;         //进行监听的文件描述符
      short int events;       //要监听的事件类型
      short int revents;      //实际发生的事件类型
   };

The pollfd structure contains three member variables fd, events and revents, respectively representing the file descriptor to be monitored, the type of event to be monitored, and the type of event that actually occurred.

How many sockets can poll listen to?

Answer: Custom, but the system needs to be able to bear

Which events in the socket can poll monitor?

The types of events to be monitored and actually occurred in the pollfd structure are represented by the following three macro definitions, namely POLLRDNORM, POLLWRNORM and POLLERR, which represent readable, writable and error events respectively.

#define POLLRDNORM 0x040 //可读事件
#define POLLWRNORM 0x100 //可写事件
#define POLLERR 0x008 //错误事件

How does poll get ready fd?

Answer: Similar to select, you need to traverse the fd collection to find a ready descriptor.

How does the poll mechanism realize network communication?

poll implementation code

int sock_fd,conn_fd; //监听套接字和已连接套接字的变量
sock_fd = socket() //创建套接字
bind(sock_fd)   //绑定套接字
listen(sock_fd) //在套接字上进行监听,将套接字转为监听套接字

//poll函数可以监听的文件描述符数量,可以大于1024
#define MAX_OPEN = 2048

//pollfd结构体数组,对应文件描述符
struct pollfd client[MAX_OPEN];

//将创建的监听套接字加入pollfd数组,并监听其可读事件
client[0].fd = sock_fd;
client[0].events = POLLRDNORM;
maxfd = 0;

//初始化client数组其他元素为-1
for (i = 1; i < MAX_OPEN; i++)
   client[i].fd = -1;

while(1) {
    //调用poll函数,检测client数组里的文件描述符是否有就绪的,返回就绪的文件描述符个数
    n = poll(client, maxfd+1, &timeout);
    //如果监听套件字的文件描述符有可读事件,则进行处理
if (client[0].revents & POLLRDNORM) {
     //有客户端连接;调用accept函数建立连接
     conn_fd = accept();

       //保存已建立连接套接字
       for (i = 1; i < MAX_OPEN; i++){
         if (client[i].fd < 0) {
           client[i].fd = conn_fd; //将已建立连接的文件描述符保存到client数组
           client[i].events = POLLRDNORM; //设置该文件描述符监听可读事件
           break;
          }
       }
       maxfd = i; 
}

//依次检查已连接套接字的文件描述符
for (i = 1; i < MAX_OPEN; i++) {
   if (client[i].revents & (POLLRDNORM | POLLERR)) {
       //有数据可读或发生错误,进行读数据处理或错误处理
    }
  }
}

Poll realizes the network communication process as shown in the figure below:
img_5.png

The poll mechanism solves the limitation that a single process of select can only listen to a maximum of 1024 sockets, but it does not solve the problem of polling to obtain ready fd.

3.3 epoll mechanism

epoll is proposed in the 2.6 kernel, using the epoll_event structure to record the fd to be monitored and the event type to be monitored.

Definition of epoll_event structure and epoll_data structure

typedef union epoll_data
{
    ...
    int fd;  //记录文件描述符
     ...
} epoll_data_t;


struct epoll_event
{
    uint32_t events;  //epoll监听的事件类型
    epoll_data_t data; //应用程序数据
};

The interface of epoll is relatively simple, and there are three functions in total:
int epoll_create(int size);
Create an epoll handle, and size is used to tell the kernel how many monitors there are. The epoll instance internally maintains two structures, which record the fd to be monitored and the ready fd, and for the ready file descriptors, they will be returned to the user program for processing.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
epoll event registration function, epoll_ctl adds, modifies or deletes interested events to the epoll object, returns 0 if successful, otherwise returns –1. In this case, you need to judge the error type based on the errno error code. It is different from select() in that it tells the kernel what type of event to listen to when listening to the event, but first registers the type of event to listen to here. The event returned by the epoll_wait method must be added to epoll through epoll_ctl.
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Wait for the event to be generated, similar to the select() call. The parameter events is used to get the collection of events from the kernel, maxevents is the size of the events collection, and is not greater than the size of epoll_create(), the parameter timeout is the timeout (milliseconds, 0 will return immediately, -1 will be uncertain, there are also sayings is permanently blocked). The function returns the number of events that need to be processed. Returning 0 means timeout, and returning –1 means error. You need to check the errno error code to determine the error type.
About the ET and LT working modes of epoll

epoll has two working modes: LT (level trigger) mode and ET (edge ​​trigger) mode.

By default, epoll works in LT mode and can handle both blocking and non-blocking sockets, while EPOLLET in the above table indicates that an event can be changed to ET mode. ET mode is more efficient than LT mode, and it only supports non-blocking sockets.

The difference between ET mode and LT mode
When a new event arrives, the event can be obtained from the epoll_wait call in ET mode, but if the socket buffer corresponding to the event has not been processed this time, in this socket When no new event comes again, the event cannot be obtained from the epoll_wait call again in ET mode; on the contrary, in LT mode, as long as the socket buffer corresponding to an event still has data, it can always be retrieved from epoll_wait to get this event. Therefore, developing epoll-based applications in LT mode is simpler and less error-prone. However, when an event occurs in ET mode, if the buffer data is not completely processed, the user request in the buffer will be lost. No response.

How many sockets can epoll monitor?

Answer: Custom, but the system needs to be able to bear

How does epoll get ready fd?

Answer: The epoll instance maintains two structures internally, which record the fd to be monitored and the fd that is ready, and can monitor the fd that is ready

How does epllo implement network communication?

Code

int sock_fd,conn_fd; //监听socket和已连接socket的变量
sock_fd = socket() //创建主动socket
bind(sock_fd)   //绑定socket
listen(sock_fd) //在socket进行监听,将socket转为监听socket

epfd = epoll_create(EPOLL_SIZE); //创建epoll实例,
//创建epoll_event结构体数组,保存socket对应文件描述符和监听事件类型    
ep_events = (epoll_event*)malloc(sizeof(epoll_event) * EPOLL_SIZE);

//创建epoll_event变量
struct epoll_event ee
//监听读事件
ee.events = EPOLLIN;
//监听的文件描述符是刚创建的监听socket
ee.data.fd = sock_fd;

//将监听socket加入到监听列表中    
epoll_ctl(epfd, EPOLL_CTL_ADD, sock_fd, &ee);

while (1) {
//等待返回已经就绪的描述符
n = epoll_wait(epfd, ep_events, EPOLL_SIZE, -1);
//遍历所有就绪的描述符     
for (int i = 0; i < n; i++) {
      //如果是监听socket描述符就绪,表明有一个新客户端连接到来
     if (ep_events[i].data.fd == sock_fd) {
        conn_fd = accept(sock_fd); //调用accept()建立连接
        ee.events = EPOLLIN;  
        ee.data.fd = conn_fd;
        //添加对新创建的已连接socket描述符的监听,监听后续在已连接socket上的读事件      
        epoll_ctl(epfd, EPOLL_CTL_ADD, conn_fd, &ee);

       } else { //如果是已连接socket描述符就绪,则可以读数据
           ...//读取数据并处理
       }
    }
}

The flow of epoll's network communication is as follows:
img_6.png

The difference between the three

IO multiplexing mechanism Listening file descriptor maximum limit How to find ready file descriptors
select 1024 Iterate through the file descriptor collection
poll customize Iterate through the file descriptor collection
epoll customize epoll_wait returns ready file descriptors

Realize the comparison chart of network communication, so that everyone can see the difference
img_7.png

The control diagram of realizing network communication

4. Using the technical framework of IO multiplexing

redis: The ae_select.c and ae_epoll.c files of Redis use the select and epoll mechanisms respectively to realize IO multiplexing;

nginx: Nginx supports various IO multiplexing methods under different operating systems such as epoll, select, and kqueue; Nginx uses epoll through ET mode.

Reactor framework, netty: Regardless of C++ or Java, most of the high-performance network programming frameworks are based on Reactor mode, the most typical of which is Java's Netty framework, and Reactor mode is based on IO multiplexing of;

The IO multiplexing explanation is over, because the IO multiplexing model is very helpful for understanding high-performance frameworks such as Redis and Nginx, so I suggest you refer to the source code and try to figure it out. If you have any questions, you can also add me on WeChat: MrWeiki, welcome to discuss and make progress together.

This article is an original article, please indicate the source when reprinting.

Link to this article: http://www.yuanjava.cn/linux/2022/01/01/iomultiplexing.html

This article comes from the blog of ape java

at last

This article roughly explains several uses of lambda from Oracle's official website, and some uses are reasonable. In the following articles, we will analyze the principles of lambda one by one.

Original link: Deep analysis of IO multiplexing mechanism

Guess you like

Origin blog.csdn.net/m0_54369189/article/details/126089798