[IO multiplexing] the difference between select, poll, epoll

1. Time complexity analysis of select, poll, epoll

(1) select: time complexity O(n)

        It only knows that an I/O event has occurred, but it doesn't know which streams it is (there may be one, multiple, or even all of them). We can only poll all streams indiscriminately to find out the Data, or streams that write data, operate on them. So select has an indiscriminate polling complexity of O(n), and the more streams processed at the same time, the longer the indiscriminate polling time.

(2) poll: time complexity O(n)

        poll is essentially the same as select, it copies the array passed in by the user to the kernel space, and then queries the device status corresponding to each fd, but it has no limit on the maximum number of connections because it is stored based on a linked list.

(3) epoll: time complexity O(1)

        epoll can be understood as event poll. Unlike busy polling and indiscriminate polling, epoll will notify us of which stream has occurred and what I/O event has occurred. So we say that epoll is actually event-driven (each event is associated with fd), and our operations on these streams are meaningful at this time. (complexity reduced to O(1))

2. IO multiplexing mechanism

        select, poll, and epoll are all IO multiplexing mechanisms. I/O multiplexing uses a mechanism to monitor multiple descriptors, and once a descriptor is ready (usually read-ready or write-ready), the program can be notified to perform corresponding read and write operations. But select, poll, and epoll are essentially synchronous I/O, because they all need to be responsible for reading and writing after the read and write events are ready, that is to say, the process of reading and writing is blocked, while asynchronous I/O does not require itself Responsible for reading and writing, the implementation of asynchronous I/O will be responsible for copying data from the kernel to user space.

        Both epoll and select can provide multiple I/O multiplexing solutions. It can be supported in the current Linux kernel. Among them, epoll is unique to Linux, and select should be stipulated by POSIX, and it is implemented in general operating systems.

Three, select, poll, epoll role and shortcoming analysis

3.1 select:

        Select essentially performs the next step by setting or checking the data structure storing the fd flag. The disadvantages of this are:

3.1.1 The number of fds that can be monitored by a single process is limited, that is, the size of the listening port is limited.

        Generally speaking, this number has a great relationship with the system memory, and the specific number can be checked by cat /proc/sys/fs/file-max. The 32-bit machine defaults to 1024. The 64-bit machine defaults to 2048.

3.1.2. When scanning the socket, it is a linear scan, that is, the method of polling is used, and the efficiency is low:

        When there are many sockets, each select() needs to traverse FD_SETSIZE Sockets to complete the scheduling, no matter which Socket is active, it is traversed once. This wastes a lot of CPU time. If you can register a callback function for the socket, and when they are active, they will automatically complete the relevant operations, then polling will be avoided, which is exactly what epoll and kqueue do.

3.1.3. It is necessary to maintain a data structure used to store a large number of fds, which will make the copying overhead of user space and kernel space large when transferring the structure

3.2 poll:

        Poll is essentially the same as select. It copies the array passed in by the user to the kernel space, and then queries the device status corresponding to each fd. If the device is ready, it adds an item to the device waiting queue and continues to traverse. If it traverses all If no ready device is found after fd, the current process will be suspended until the device is ready or the initiative times out. After being woken up, it will traverse fd again. This process has gone through many unnecessary traversals.

It does not have a limit on the maximum number of connections because it is stored based on a linked list, but it also has the following disadvantages:

  • A large number of fd arrays are copied between the user mode and the kernel address space as a whole, regardless of whether such copying makes sense.

  • Another feature of poll is "horizontal triggering". If the fd is not processed after being reported, the fd will be reported again in the next poll.

3.3 epoll:

        epoll has two trigger modes , EPOLLLT and EPOLLET, LT is the default mode, and ET is the "high speed" mode. In LT mode, as long as the fd still has data to read, epoll_wait will return its event every time to remind the user program to operate, while in ET (edge ​​trigger) mode, it will only prompt once until there is data next time There will be no more prompts before the inflow, no matter whether there is data readable in fd or not.

        So in ET mode, when reading an fd, you must read all its buffers, that is to say, read until the return value of read is less than the requested value, or encounter an EAGAIN error. Another feature is that epoll uses the "event" ready notification method to register fd through epoll_ctl. Once the fd is ready, the kernel will use a callback mechanism similar to callback to activate the fd, and epoll_wait can receive the notification.

Why does epoll have EPOLLET trigger mode?

        If the EPOLLLT mode is used, once there are a large number of ready file descriptors in the system that you do not need to read and write, they will return every time they call epoll_wait, which will greatly reduce the efficiency of the handler to retrieve the ready file descriptors that it cares about. If the edge trigger mode of EPOLLET is used, when a readable and writable event occurs on the monitored file descriptor, epoll_wait() will notify the handler to read and write.

        If you haven't read and written all the data this time (such as the read and write buffer is too small), it will not notify you the next time you call epoll_wait(), that is, it will only notify you once until the file descriptor is You will be notified only when there is a second readable and writable event! ! ! This mode is more efficient than horizontal triggering, and the system will not be flooded with a large number of ready file descriptors that you don't care about.

4. Compared with select and poll, the advantages of epoll:

  • There is no limit on the maximum concurrent connection, and the upper limit of the FD that can be opened is much larger than 1024 (about 100,000 ports can be monitored on 1G of memory);

  • The efficiency improvement is not a polling method, and the efficiency will not decrease as the number of FDs increases. Only the active and available FD will call the callback function; that is, the biggest advantage of Epoll is that it only cares about your "active" connections, and has nothing to do with the total number of connections. Therefore, in the actual network environment, the efficiency of Epoll will be much higher than that of select and poll.

  • Memory copy, use mmap() file mapping memory to accelerate message passing with kernel space; that is, epoll uses mmap to reduce copying overhead.

Five, select, poll, epoll difference summary:

1. Support the maximum number of connections that a process can open

select:

        The maximum number of connections that can be opened by a single process is defined by the FD_SETSIZE macro, and its size is the size of 32 integers (on a 32-bit machine, the size is 3232, and FD_SETSIZE is 3264 on a 64-bit machine). Of course, we can Modify and then recompile the kernel, but performance may be affected, which requires further testing.

poll:

        Poll is essentially the same as select, but it has no limit on the maximum number of connections because it is stored based on a linked list

epoll:

        Although the number of connections has an upper limit, it is very large. A machine with 1G memory can open about 100,000 connections, and a machine with 2G memory can open about 200,000 connections.

2. The IO efficiency problem caused by the rapid increase of FD

select: Because the connection is traversed linearly every time it is called, as the FD increases, it will cause a "linear decline performance problem" that traverses slowly.

poll: Same as above ( select)

epoll: Because the implementation in the epoll kernel is based on the callback function on each fd, only active sockets will actively call the callback, so when there are fewer active sockets, using epoll does not have the linear decline of the previous two Performance issues, but when all sockets are active, there may be performance issues.

3. Message delivery method

select: The kernel needs to pass the message to the user space, and the kernel copy action is required

poll: ditto

epoll: epoll is implemented by sharing a piece of memory between the kernel and user space.

6. Summary:

To sum up, when choosing select, poll, and epoll, it should be based on the specific usage occasions and the characteristics of these three methods.

1. On the surface, epoll has the best performance, but when the number of connections is small and the connections are very active, the performance of select and poll may be better than epoll. After all, epoll's notification mechanism requires many function callbacks.

2. Select is inefficient because it needs to poll every time. But inefficiency is also relative, depending on the situation, it can also be improved by good design

7. Combined with the data, the comparative analysis of three kinds of IO multiplexing

Today, I will compare these three kinds of IO multiplexing, refer to the information on the Internet and the book, and organize them as follows:

1. Select implementation

The calling process of select is as follows:

  • Use copy_from_user to copy fd_set from user space to kernel space

  • Register callback function __pollwait

  • Traverse all fd and call its corresponding poll method (for socket, this poll method is sock_poll, sock_poll will call tcp_poll, udp_poll or datagram_poll according to the situation) -Take tcp_poll as an example, its core implementation is __pollwait, which is registered above Callback.

  • The main job of __pollwait is to hang the current (current process) in the waiting queue of the device. Different devices have different waiting queues. For tcp_poll, the waiting queue is sk->sk_sleep (note that the process is hung in the waiting queue does not mean that the process has slept). After the device receives a message (network device) or fills in the file data (disk device), it will wake up the process of sleeping on the device waiting queue, and then current will be woken up.

  • When the poll method returns, it will return a mask describing whether the read and write operations are ready, and assign a value to fd_set according to the mask mask.

  • If all fds have been traversed and a readable and writable mask has not been returned, schedule_timeout will be called to make the process (that is, current) that calls select go to sleep. When the device driver can read and write its own resources, it will wake up the process sleeping on the waiting queue. If no one wakes up after a certain timeout period (specified by schedule_timeout), the process calling select will be woken up again to obtain the CPU, and then traverse fd again to determine whether there is a ready fd.

  • Copy fd_set from kernel space to user space.

Summarize:

Several major disadvantages of select:

  • Every time select is called, the fd collection needs to be copied from the user state to the kernel state. This overhead will be very large when there are many fds

  • At the same time, every time you call select, you need to traverse all the fds passed in in the kernel. This overhead is also very large when there are many fds.

  • The number of file descriptors supported by select is too small, the default is 1024

2. Poll implementation

        The implementation of poll is very similar to that of select, except that the way to describe the fd collection is different. poll uses the pollfd structure instead of the fd_set structure of select, and the others are similar. Managing multiple descriptors is also polling, and processing is performed according to the state of the descriptor. But poll has no limit on the maximum number of file descriptors.

        The same disadvantage of poll and select is that the array containing a large number of file descriptors is copied between the user mode and the kernel address space as a whole, regardless of whether these file descriptors are ready or not, its overhead increases with the number of file descriptors while increasing linearly.

3、epoll

        Since epoll is an improvement on select and poll, it should be able to avoid the above three shortcomings. So how does epoll solve it? Before that, let's take a look at the difference between epoll and select and poll's calling interface. Both select and poll only provide one function - select or poll function.

        And epoll provides three functions, epoll_create, epoll_ctl and epoll_wait, epoll_create is to create an epoll handle; epoll_ctl is to register the event type to be monitored; epoll_wait is to wait for the event to be generated.

        For the first shortcoming, epoll's solution is in the epoll_ctl function. Every time a new event is registered to the epoll handle (specify EPOLL_CTL_ADD in epoll_ctl), all fds will be copied into the kernel instead of repeated copying during epoll_wait. epoll guarantees that each fd will only be copied once during the entire process.

        For the second shortcoming, epoll's solution is not like select or poll, which adds the current to the device waiting queue corresponding to fd in turn every time, but only hangs the current once when epoll_ctl (this time is essential) and for each Each fd specifies a callback function that will be called when the device is ready to wake up the waiters on the waiting queue, and this callback function will add the ready fd to a ready list). The work of epoll_wait is actually to check whether there is any ready fd in this ready list (use schedule_timeout() to realize the effect of sleeping for a while and judging for a while, which is similar to step 7 in the implementation of select).

        For the third shortcoming, epoll does not have this limitation. The upper limit of the FD it supports is the maximum number of files that can be opened. This number is generally much greater than 2048. For example, it is about 100,000 on a machine with 1GB of memory. The specific number You can check it with cat /proc/sys/fs/file-max. Generally speaking, this number has a lot to do with system memory.

8. Summary:

(1) The implementation of select and poll needs to continuously poll all fd collections by itself until the device is ready, during which it may have to sleep and wake up multiple times alternately. In fact, epoll also needs to call epoll_wait to continuously poll the ready linked list. During the period, sleep and wake up may alternate multiple times, but when the device is ready, it calls the callback function, puts the ready fd into the ready linked list, and wakes up to sleep in epoll_wait process.

        Although both sleep and alternate, select and poll have to traverse the entire fd collection when "awake", and epoll only needs to judge whether the ready list is empty when "awake", which saves a lot of CPU time. This is the performance improvement brought by the callback mechanism.

(2) select, poll needs to copy the fd collection from user state to kernel state once every time it is called, and hang current to the device waiting queue once, while epoll only needs to copy once, and hang current to the waiting queue. Only hang once (at the beginning of epoll_wait, note that the waiting queue here is not a device waiting queue, but a waiting queue defined inside epoll). This can also save a lot of overhead.

Guess you like

Origin blog.csdn.net/gongzi_9/article/details/126987427