Understand this article, the problem of IO reuse is easy to come by

Take an example from life to explain.
Suppose you are studying in college and you have to wait for a friend to visit, and this friend only knows that you are in Building A, but does not know where you live, so you make an appointment to be in Building A Meet at the entrance of the building.
If you use the blocking IO model to deal with this problem, then you can only wait at the entrance of Building A to wait for your friends to arrive. During this time, you can't do other things. It's not difficult to know. The efficiency of this method is low.

Further explain the difference between select and epoll models.

The aunt of the select version does the following things: for example, the friend of classmate A comes, the aunt of the select version is more stupid, she takes her friends to check who is classmate A, and your friends are here, so in the actual code , The select version of the aunt does the following things:

int n = select(&readset,NULL,NULL,100); 
for (int i = 0; n > 0; ++i) {
    
    
     if (FD_ISSET(fdarray[i], &readset)) {
    
    
         do_something(fdarray[i]); --n; 
    } 
}

The epoll version of the aunt is more advanced. She wrote down the information of classmate A, such as his room number. Then when the friend of classmate A arrives, you only need to tell the friend which room classmate A is in. You don't need to bring it yourself. The crowded building is looking for someone. So what the epoll version aunt does can be represented by the following code:

n = epoll_wait(epfd,events,20,500);
for(i=0;i<n;++i) {do_something(events[n]);
}
In epoll, the key data structure epoll_event is defined as follows:

typedef union epoll_data {
    
    
     void *ptr;
     int fd; 
     __uint32_t u32;
     __uint64_t u64; 
}epoll_data_t; 
struct epoll_event {
    
     
    __uint32_t events; /* Epoll events */ 
    epoll_data_t data;/* User data variable */
 }; 

As you can see, epoll_data is a union structure, which is the structure used by the epoll version of aunt to save classmate information. It can store many types of information:
fd, pointers, etc. With this structure, aunt epoll can not use it The power of blowing dust can be positioned to Classmate A.

Don't underestimate these efficiency improvements. In a large-scale concurrent server, polling IO is one of the most time-consuming operations. Going back to the example, if every time a friend arrives, the floor manager must have the entire building Inquire about classmates, then the processing efficiency will inevitably be low, and there will be a lot of people downstairs soon.

Comparing the earliest block IO processing model, you can see that after using multiplexed IO, the program can freely perform its own work other than IO operations, only when the IO state changes, it is multiplexed IO informs, and then takes corresponding actions, instead of waiting for the IO status to change.

It can also be seen from the above analysis that the improvement of epoll over select is actually a specific application of the idea of ​​using space for time.

[Article benefits] The editor recommends my own linuxC/C++ language exchange group: 832218493. I have compiled some learning books and video materials that I think are better for sharing in it. You can add them if you need them! ~
Insert picture description here
More excellent articles in the public account
Insert picture description here

2. Deeply understand the realization principle of epoll: when developing high-performance network programs, windows developers must call it iocp, and linux developers must call it epoll.

Everyone knows that epoll is an IO multiplexing technology, which can handle millions of socket handles very efficiently, which is much more efficient than the previous select and poll.

We all feel very comfortable using epoll. It is really fast. So, why can it handle so many concurrent connections at high speed?

Let's briefly review how to use the 3 epoll system calls encapsulated by the C library.

int epoll_create(int size); 
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); 
int epoll_wait(int epfd, struct epoll_event *events,int maxevents, int timeout);

It is very clear to use. First, call epoll_create to create an epoll object. The parameter size is the maximum number of handles that the kernel is guaranteed to be able to handle correctly. If it exceeds this maximum number, the kernel does not guarantee the effect.
epoll_ctl can operate the epoll created above, for example, add the newly created socket to epoll for monitoring, or move a socket handle that epoll is monitoring out of epoll, and no longer monitor it, etc.

When epoll_wait is called, within a given timeout time, when an event occurs in all monitored handles, it returns to the user mode process.
From the above calling method, we can see the advantages of epoll over select/poll:

Because the latter must pass all the sockets you want to monitor to the select/poll system call every time it is called, which means that the user mode socket list needs to be copied to the kernel mode. If there are tens of thousands of handles, it will be required every time. Copying hundreds of KB of memory to kernel mode is very inefficient.

When we call epoll_wait, it is equivalent to calling select/poll in the past, but at this time, we don't need to pass the socket handle to the kernel, because the kernel has already obtained the list of handles to be monitored in epoll_ctl.

So, in fact, after you call epoll_create, the kernel is ready to help you store the handle to be monitored in the kernel mode. Each call to epoll_ctl is just to insert a new socket handle into the kernel data structure.

In the kernel, everything is a file. Therefore, epoll registers a file system with the kernel to store the aforementioned monitored socket.

When you call epoll_create, a file node will be created in this virtual epoll file system. Of course, this file is not an ordinary file, it only serves epoll. When epoll is initialized by the kernel (the operating system is started), it will open up epoll's own kernel high-speed cache area to place each socket we want to monitor. These sockets will be stored in the kernel cache in the form of red-black trees. To support fast search, insert, and delete.

This kernel high-speed cache area is to create a continuous physical memory page, and then build a slab layer on top of it. Simply put, it is to physically allocate a memory object of the size you want, and use the idle memory each time you use it. Assigned objects.

static int __init eventpoll_init(void) { …… /* Allocates slab cache used to allocate “struct epitem” items / epi_cache = kmem_cache_create(“eventpoll_epi”, sizeof(struct epitem),0,SLAB_HWCACHE_ALIGN| EPI_SLAB_DEBUG|SLAB_PANIC, NULL, NULL) ; / Allocates slab cache used to allocate “struct eppoll_entry” */ pwq_cache = kmem_cache_create(“eventpoll_pwq”, sizeof(struct eppoll_entry), 0, EPI_SLAB_DEBUG|SLAB_PANIC, NULL, NULL); …… The efficiency of epoll lies when we call When epoll_ctl stuffs millions of handles into it, epoll_wait can still return quickly, and effectively give us the user the handle of the event.






This is because when we call epoll_create, the kernel will not only help us build a file node in the epoll file system, and build a red-black tree in the kernel cache to store the socket from epoll_ctl in the future, but also build it again. A list linked list is used to store ready events. When epoll_wait is called, just observe whether there is data in this list linked list. Return if there is data, sleep if there is no data, and return even if there is no data in the linked list after the timeout. Therefore, epoll_wait is very efficient.

So, how is this list of ready lists maintained? When we execute epoll_ctl, in addition to putting the socket on the red-black tree corresponding to the file object in the epoll file system, we will also register a callback function for the kernel interrupt handler to tell the kernel that if the handle of the interrupt arrives, it will It is placed in the ready list.

Therefore, when there is data on a socket, the kernel will insert the socket into the ready list after copying the data on the network card into the kernel.
In this way, a red-black tree, a linked list of ready handles, and a small amount of kernel cache help us solve the socket processing problem under large concurrency.

When epoll_create is executed, a red-black tree and a ready linked list are created. When epoll_ctl is executed, if the socket handle is added, check whether it exists in the red-black tree, if it exists, return immediately, if it does not exist, add it to the trunk, and then register the callback function with the kernel , Used to insert data into the ready linked list when the interrupt event comes. When executing epoll_wait, immediately return the data in the ready list.

Finally, take a look at the two unique modes of epoll, LT and ET. Both the LT and ET models are applicable to the processes mentioned above.

The difference is that in the LT mode, as long as the event on a handle is not processed at one time, the handle will be returned again and again when epoll_wait is called in the future, while the ET mode only returns for the first time.

How did this happen? When there is an event on a socket handle, the kernel will insert the handle into the ready list linked list mentioned above. At this time, we call epoll_wait, which will copy the ready socket to the user mode memory, and then clear the ready list linked list, and finally What epoll_wait does is to check these sockets, if they are not in ET mode (it is a handle in LT mode), and when there are indeed unprocessed events on these sockets, they put the handle back to the ready list that was just emptied. Up.

Therefore, for non-ET handles, as long as there are events on it, epoll_wait will return every time. The handle of ET mode, unless there is a new interrupt, even if the event on the socket has not been processed, it will not return from epoll_wait again and again.

3. Extended reading (comparison between epoll and other related technologies before):

Linux provides select, poll, and epoll interfaces to implement IO multiplexing. The prototypes of the three are shown below. This article compares the three from the aspects of parameters, implementation, and performance.

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
select、poll、epoll_wait参数及实现对比

1. The first parameter nfds of select is the maximum descriptor value in the fdset set plus 1, fdset is a bit array whose size is limited to __FD_SETSIZE (1024), and each bit of the bit array represents whether the corresponding descriptor is needed Be checked.
The second, third, and fourth parameters of select represent the file descriptor bit arrays that need to pay attention to read, write, and error events. These parameters are both input and output parameters, and may be modified by the kernel to indicate which descriptors have events of concern. .
So every time you call select, you need to re-initialize fdset.
The timeout parameter is the timeout period, the structure will be modified by the kernel, and its value is the time remaining for the timeout.
select corresponds to the sys_select call in the kernel. sys_select first copies the fd_set pointed to by the second, third, and fourth parameters to the kernel, and then polls each descriptor call that is SET, and records it in the temporary result (fdset), if any When an event occurs, select will write the temporary result to the user space and return; when no event occurs after polling again, if the timeout period is specified, select will sleep until the timeout, and then poll again after the sleep ends, and The temporary result is written to user space and then returned. After the select returns, it is necessary to check whether the concerned descriptor is SET (whether the event occurs).

2. Polfd is different from select. The events that need attention are passed to the kernel through a pollfd array, so there is no limit to the number of descriptors. The events field and revens in pollfd are used to mark the events of interest and the events that occurred, so the pollfd array It only needs to be initialized once.
The implementation mechanism of poll is similar to that of select. It corresponds to the sys_poll in the kernel, except that poll passes the pollfd array to the kernel, and then polls each descriptor in pollfd. Compared to processing fdset, poll is more efficient. After poll returns, you need to check the revens value of each element in pollfd to determine whether the event occurred.

3. Epoll creates a descriptor for epoll polling through epoll_create, adds/modifies/deletes events through epoll_ctl, checks the event through epoll_wait, and the second parameter of epoll_wait is used to store the result. Epoll is different from select and poll. First, it does not need to copy event description information to the kernel every time it is called. After the first call, the event information will be associated with the corresponding epoll descriptor. In addition, epoll is not by polling, but by registering a callback function on the waiting descriptor. When an event occurs, the callback function is responsible for storing the event in the ready event linked list and finally writing it to the user space.

Author: Jin Fameng audio
link: https: //www.jianshu.com/p/b5bc204da984
Source: Jane books
are copyrighted by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

Guess you like

Origin blog.csdn.net/lingshengxueyuan/article/details/110687058