I/O multiplexing --- select/poll/epoll

I/O multiplexing (I/O multiplexing)

The term multiplexing is actually mostly used in the field of communication. In order to make full use of the communication line, it is hoped to transmit multiple signals in one channel. If you want to transmit multiple signals in one channel, you need to combine the multiple signals into one, and combine the multiple signals The device that combines a signal is called a Multiplexer (multiplexer). Obviously, the receiver needs to restore the original multiplex signal after receiving the combined signal. This device is called a Demultiplexer (a multiplexer). ).

insert image description here
The so-called I/O multiplexing refers to such a process:

1) We got a bunch of file descriptors (whether network-related, or disk file-related, etc., any file descriptor is fine);
2) Tell the kernel by calling a function: "Don't return this function, You monitor these descriptors for me, and you can return when there are I/O read and write operations in this pile of file descriptors";
3) When the called function returns, we can know which file descriptors can The I/O operation is performed.

The Three Musketeers of I/O Multiplexing

In essence: select, poll, and epoll on Linux are all blocking I/O, which is what we often call synchronous I/O.
The reason is: when calling these I/O multiplexing functions, if any file descriptor that needs to be monitored is unreadable or writable, the process will be blocked and suspended until there is a file descriptor that is readable or writable. .

select: fledgling

Under the I/O multiplexing mechanism of select, we need to tell select the file description set we want to monitor in the form of function parameters, and then select will copy these file descriptor sets to the kernel.

We know that data copying has performance loss, so in order to reduce the performance loss caused by this kind of data copying, the Linux kernel limits the size of the collection, and stipulates that the file description collection monitored by the user cannot exceed 1024. At the same time, when select returns In the end we can only know that some file descriptors can be read and written, but we don't know which one. Therefore, the programmer must traverse again to find out which file descriptor can be read and written.
Therefore, to sum up, select has the following characteristics:
1) The number of file descriptors I can take care of is limited and cannot exceed 1024;
2) The file descriptors given to me by the user need to be copied in the kernel;
3) I can only tell you There are file descriptors that meet the requirements, but I don't know which ones, you can find them one by one (traverse).
Therefore, we can see that these features of the select mechanism are undoubtedly inefficient in scenarios where high-concurrency web servers frequently have tens of thousands or hundreds of thousands of concurrent connections.

poll: small success

poll and select are very similar.
The optimization of poll relative to select is only to solve the limitation that the number of file descriptors cannot exceed 1024. Both select and poll will degrade in performance as the number of monitored file descriptions increases, so they are not suitable for high-concurrency scenarios.

epoll: unique in the world

Among the three problems faced by select, the limit on the number of file descriptions has been solved in poll. What about the remaining two problems?
For the copy problem: the strategy used by epoll is to break each and share memory.

In fact: the change frequency of the file descriptor set is relatively low, select and poll frequently copy the entire set, and the kernel is almost annoyed to death, epoll is very considerate by introducing epoll_ctl to only operate those file descriptors that have changed. At the same time, epoll and the kernel have also become good friends, sharing the same piece of memory, which stores the set of file descriptors that are already readable or writable, thus reducing the copying overhead of the kernel and programs.

Aiming at the problem of needing to traverse file descriptors to know which is readable and writable, the strategy used by epoll is "becoming a younger brother".

Under the select and poll mechanism: the process has to end up in person and wait on each file descriptor. Any file description that is readable or writable will wake up the process. However, after the process is woken up, it still looks confused and does not know which file description Whether the character is readable or writable, it is checked again from beginning to end.

But epoll is much more sensible, and takes the initiative to find the process and want to be the younger brother to stand up for the older brother.

Under this mechanism: the process does not need to end in person, the process only needs to wait on epoll, epoll replaces the process to wait on each file descriptor, and tells epoll when which file descriptor is readable or writable, and epoll uses small Record it carefully and then wake up the big brother: "Brother Process, wake up, I have written down the file descriptors you want to process", so that after the process is woken up, you don't need to check it from beginning to end, because the epoll brother has already memorized down.

Therefore, we can see that under the mechanism of epoll, the strategy of "don't call me, I will call you if necessary" is actually used. The process does not need to ask each file descriptor over and over again. Instead, turn over and become the master-"Which of your file descriptors is readable or writable, please report it."

This mechanism is actually the famous event-driven - Event-driven

reference article

Guess you like

Origin blog.csdn.net/weixin_44119881/article/details/112252077