Understand select/poll/epoll, this is it

Hello, my name is yes.

The network I/O related points have been listed before, and there are still some differences between select/poll/epoll. This article will deal with them, and distinguish them from the perspective of fully understanding the principle.

Originally I wanted to upload the source code, but I don't think it is necessary. As an application developer, I think it's enough to understand the principle. I forget the source code anyway, and understanding is the most important thing! So I try to avoid code and use the vernacular to discuss these three things.

Without further ado, let's go.

small thinking

First of all, we know that select/poll/epoll is used to achieve multiplexing, that is, a thread can use them to hold multiple sockets.

According to this idea, the thread cannot be blocked by any managed Socket, and any Socket must notify the select/poll/epoll thread after data is received.

Think about it, how should this be achieved?

Let's analyze the logic of select

According to our understanding, the model that select manages multiple Sockets is shown in the following figure:

Here we should pay attention to the interaction between kernel mode and user mode. User programs cannot access the kernel space.

Therefore, we call select to pass the fd (file descriptor, all files under Linux) of all sockets to be managed to the kernel.

At this point, it is necessary to traverse all sockets to see if any events of interest occur. If there is no event on a socket, then the select thread needs to give up the cpu blocking wait. This wait can be a dead wait without a timeout, or a wait with a timeout with a timeout.

Suppose the client sends data at this time, and the data received by the network card is stuffed into the receiving queue of the corresponding socket. At this time, the socket knows that the data is coming. How to wake up select?

In fact, each socket has its own sleep queue, and select will arrange an internal response , that is, insert an entry into the sleep queue of the managed socket.

When the socket receives the data from the network card, it will go to its sleep queue to traverse the entry, and call the callback method set by the entry. This callback method can wake up the select!

So select inserts an entry related to it in the sleep queue of each socket managed by it, so that no matter which socket comes with data, it can be woken up and work immediately!

However, the implementation of select is not very good, because the wake-up select only knows that it is alive at this time, and does not know which socket is coming to the data, so it can only foolishly traverse all sockets to see which socket is coming to live. , and then encapsulate all incoming and outgoing sockets into event returns.

In this way, the user program can obtain the events that have occurred, and then perform I/O and business processing.

This is the implementation logic of select, which should not be difficult to understand.

Here is another mention of the limitation of select, because the managed socket fd needs to be copied from user space to kernel space, in order to control the size of the copy, the size of the fds set that each select can copy is only 1024.

Then if you want to change it, you can only modify the macro... and then recompile the kernel. Many articles on the Internet say this, but (yes, there is a but) .

I read an article, there is indeed this macro, the value is also 1024, but the kernel does not limit the size of the fds collection at all . Then the trustee asked a kernel boss, the boss said that the kernel does not have restrictions, and the glibc layer does it.

So... recompile the kernel? At the end of that article.

poll

Compared with select, poll mainly optimizes the structure of fds. It is no longer a bit array, but a thing called pollfd. Anyway, it doesn't care about the 1024 limit.

But no one uses poll now, so I won't say more.

epoll

This is the point.

I believe that after looking at the implementation of select, we can think of a few points that can be optimized after a little thought.

For example, why does each select need to transfer the monitored fds to the kernel? Can not be maintained in the kernel?

Why does the socket only wake up select, and can't tell which socket is coming to the data?

epoll is mainly optimized based on the above two points .

First, a method called epoll_ctl is created, which is used to manage and maintain which sockets monitored by epoll.

If you want to add a new socket to manage your epoll, then call epoll_ctl. To delete a socket, also call epoll_ctl, and control additions, deletions and changes through different input parameters.

In this way, the socket set managed by this epoll is maintained in the kernel, so that it is not necessary to copy all the managed fds to the kernel every time it is called.

By the way, this socket set is implemented with a red-black tree.

Then, similar to select, an entry will be added to the sleep queue of each socket. When each socket receives data, the callback corresponding to the entry will also be called.

Different from select , a ready_list doubly linked list is introduced, and the callback will add the current socket to the ready_list and wake up epoll.

In this way, the awakened epoll only needs to traverse the ready_list. There must be sockets with readable data in this linked list. Compared with select, it will not do useless traversal .

At the same time, the readable fd collected is supposed to be copied to the user space. Here is another optimization, using mmp to map the user space and the kernel space into the same memory, thus avoiding copying.

perfect~

This is the optimization made by epoll based on select, and there are some differences that are not detailed. For example, epoll blocks sleep in a single_epoll_wait_list instead of a socket sleep queue, etc. I won't mention it, and understanding the above is enough.

ET<

It's all about epoll, and it's inevitable to talk about the two modes of ET and LT.

ET, edge-triggered . According to the above logic, when epoll traverses the ready_list, it will remove the socket from the ready_list, and then read the event of the socket.

LT, horizontal triggering, is a bit different. In this mode, when epoll traverses the ready_list, it will remove the socket from the ready_list, and then read the event of the socket. If the socket returns an event of interest, then the current The socket will be added to the ready_list again, so that the socket can still be obtained when epoll_wait is called next time.

This is the most essential difference between the two.

Seeing this, some people will ask, what kind of different results will the use of these two modes cause?

If a client sends 5 data packets at the same time, according to the normal logic, it only needs to wake up the epoll once and add the current socket to the ready_list once, no need to add 5 times. Then the user program can read all the packets in the socket receive queue.

But suppose the user program has read a package, and then an error is reported, and it is not read later, what about the next four packages?

If it is in ET mode, it cannot be read, because there is no trigger condition for adding socket to ready_list. Unless the client sends a new packet, then the current socket will be added to the ready_list. Before the new packet comes, these four packets will not be read .

The LT mode is different, because each time an event of interest occurs after reading, the current socket will be added to the ready_list, so the socket will definitely be read next time, so the next 4 packets will be accessed. Regardless of whether the client sends new packets or not.

At this point, I think you should understand what ET is and what is LT, without getting dizzy at what state changes trigger these incomprehensible nouns.

At last

Well, today's analysis is over. I personally think that the understanding of select/poll/epoll is almost the same. Of course, there are many details. You need to go to the source code to explore by yourself. It is the conclusion drawn from reading source code analysis articles on the Internet.

I don't recommend reading it so deeply. After all, people's energy is limited, right? When it comes to related underlying optimizations, it's not too late to study.

I'm yes, from a little bit to a billion little bit, see you in the next article.

refer to:

  • https://blog.csdn.net/dog250/article/details/105896693 (Is select really limited by 1024?)
  • https://blog.csdn.net/dog250/article/details/50528373

Guess you like

Origin blog.csdn.net/yessimida/article/details/122605917