Why does epoll replace select for event triggering in linux network programming

In linux network programming, select is used for event triggering for a long time. In the new Linux kernel, there is a mechanism to replace it, which is epoll.
Compared with select, the biggest advantage of epoll is that it will not reduce efficiency as the number of monitoring fd grows. Because in the select implementation in the kernel, it is processed by polling, the more the number of fd polled, the more time it takes. And, there is a declaration in the linux/posix_types.h header file:
#define __FD_SETSIZE 1024
means that select can monitor 1024 fd at most at the same time. Of course, this number can be expanded by modifying the header file and recompiling the kernel, but this does not seem to be a cure .

The interface of epoll is very simple. There are three functions in total:
1. int epoll_create(int size);
Create an epoll handle, and size is used to tell the kernel how much the total number of monitors is. This parameter is different from the first parameter in select() and gives the value of the maximum monitored fd+1. It should be noted that when the epoll handle is created, it will occupy a fd value. If you check /proc/process id/fd/ under linux, you can see this fd, so after using epoll, you must Call close() to close, otherwise fd may be exhausted.

Note: The size parameter only tells the kernel the approximate number of events that this epoll object will process, not the maximum number of events that can be processed. In the implementation of some of the latest Linux kernel versions, this size parameter has no meaning.

2. int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

The event registration function of epoll, epoll_ctl adds, modifies or deletes events of interest to the epoll object, returns 0 to indicate success, otherwise returns -1. At this time, the error type needs to be judged according to the errno error code.

It is different from select() in that it tells the kernel what type of event to monitor when listening to the event, but first register the type of event to be monitored here.

The event returned by the epoll_wait method must be added to epoll through epoll_ctl.

The first parameter is the return value of epoll_create(), and the second parameter represents the action, which is represented by three macros:
EPOLL_CTL_ADD: register a new fd to epfd;
EPOLL_CTL_MOD: modify the listening event of the registered fd;
EPOLL_CTL_DEL: slave Delete a fd in epfd; the
third parameter is the fd that needs to be monitored, and the fourth parameter is to tell the kernel what to monitor. The structure of struct epoll_event is as follows:

typedef union epoll_data {
    void *ptr;
    int fd;
    __uint32_t u32;
    __uint64_t u64;
} epoll_data_t;

struct epoll_event {
    __uint32_t events; /* Epoll events */
    epoll_data_t data; /* User data variable */
};

events can be a collection of the following macros:
EPOLLIN: indicates that the corresponding file descriptor can be read (including the normal closure of the peer SOCKET);
EPOLLOUT: indicates that the corresponding file descriptor can be written;
EPOLLPRI: indicates that the corresponding file descriptor is urgent The data is readable (it should indicate that there is out-of-band data);
EPOLLERR: indicates that the corresponding file descriptor has an error;
EPOLLHUP: indicates that the corresponding file descriptor is hung up;
EPOLLET: EPOLL is set to edge trigger (Edge Triggered) Mode, this is relative to Level Triggered.
EPOLLONESHOT: Only listen to the event once. After listening to this event, if you need to continue to listen to the socket, you need to add the socket to the EPOLL queue again

3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Waiting for the event to be generated, similar to a select() call. The parameter events is used to get the collection of events from the kernel. Maxevents tells the kernel how big the events are. The value of maxevents cannot be larger than the size when epoll_create() is created. The parameter timeout is the timeout period (milliseconds, 0 will return immediately, -1 will Not sure, there are also sayings that it is permanently blocked). This function returns the number of events that need to be processed, such as returning 0 to indicate timeout. If it returns –1, it means that an error has occurred and you need to check the errno error code to determine the type of error.

The first parameter epfd is the descriptor of epoll.

The second parameter events is the allocated epoll_event structure array, epoll will copy the events that occur to the events array (events cannot be a null pointer, the kernel is only responsible for copying the data to the events array, not Help us allocate memory in user mode. The kernel is very efficient in this way).

The third parameter maxevents represents the maximum number of events that can be returned this time, usually the maxevents parameter is equal to the size of the pre-allocated events array.

The fourth parameter timeout indicates the maximum waiting time (in milliseconds) when no event is detected. If timeout is 0, it means that epoll_wait is empty in the rdllist list and returns immediately without waiting.

4. Regarding the two working modes of ET and LT:

epoll has two working modes: LT (horizontal trigger) mode and ET (edge ​​trigger) mode.

By default, epoll works in LT mode, which can handle blocking and non-blocking sockets. EPOLLET in the table above indicates that an event can be changed to ET mode. The efficiency of ET mode is higher than that of LT mode. It only supports non-blocking sockets.

 The difference between ET mode and LT mode is:

When a new event arrives, the event can of course be obtained from the epoll_wait call in ET mode, but if the socket buffer corresponding to this event is not processed this time, there is no new event in this socket again When it comes, in ET mode it is impossible to get this event from the epoll_wait call again; while in LT mode, the opposite is true. As long as the socket buffer corresponding to an event still has data, the event can always be obtained from epoll_wait. Therefore, it is simpler to develop epoll-based applications in LT mode and is less error-prone. When an event occurs in ET mode, if the buffer data is not completely processed, it will cause user requests in the buffer to get No response. By default, Nginx uses epoll in ET mode.


Conclusion: The
ET mode is only notified when the state changes. The so-called state change here does not include unprocessed data in the buffer. That is to say, if you want to use the ET mode, you need to read/write until Until the error occurs, many people have reflected why the ET mode only receives a part of the data and no longer gets notified, mostly because of this; and the LT mode is to keep the notification as long as there is no data processing.


So how to use epoll What? It's actually very simple.
By including a header file #include <sys/epoll.h> and a few simple APIs, you can greatly increase the number of supporters of your web server.

First, create an epoll handle by create_epoll(int maxfds), where maxfds is the maximum number of handles supported by your epoll. This function will return a new epoll handle, and all subsequent operations will be operated through this handle. After using up, remember to use close() to close the created epoll handle.

Then in your network main loop, call epoll_wait(int epfd, epoll_event events, int max events, int timeout) every frame to query all network interfaces to see which one can read and which one can write. The basic syntax is:
nfds = epoll_wait(kdpfd, events, maxevents, -1);
Among them, kdpfd is the handle created with epoll_create, events is a pointer to epoll_event*, when the epoll_wait function is successfully operated, all read and write events will be stored in epoll_events. max_events is the number of all socket handles that currently need to be monitored. The last timeout is the timeout of epoll_wait. When it is 0, it means to return immediately. When it is -1, it means to wait until there is an event range. When it is any positive integer, it means to wait for such a long time. If there is no event, then range. Generally, if the network main loop is a separate thread, you can use -1 to wait to ensure some efficiency. If it is in the same thread as the main logic, you can use 0 to ensure the efficiency of the main loop.

After the scope of epoll_wait, there should be a loop that utilises all events.

Almost all epoll programs use the following framework:

for( ; ; )
    {
        nfds = epoll_wait(epfd,events,20,500);
        for(i=0;i<nfds;++i)
        {
            if(events[i].data.fd==listenfd) //有新的连接
            {
                connfd = accept(listenfd,(sockaddr *)&clientaddr, &clilen); //accept这个连接
                ev.data.fd=connfd;
                ev.events=EPOLLIN|EPOLLET;
                epoll_ctl(epfd,EPOLL_CTL_ADD,connfd,&ev); //将新的fd添加到epoll的监听队列中
            }
            else if( events[i].events&EPOLLIN ) //接收到数据,读socket
            {
                n = read(sockfd, line, MAXLINE)) < 0    //读
                ev.data.ptr = md;     //md为自定义类型,添加数据
                ev.events=EPOLLOUT|EPOLLET;
                epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev);//修改标识符,等待下一个循环时发送数据,异步处理的精髓
            }
            else if(events[i].events&EPOLLOUT) //有数据待发送,写socket
            {
                struct myepoll_data* md = (myepoll_data*)events[i].data.ptr;    //取数据
                sockfd = md->fd;
                send( sockfd, md->ptr, strlen((char*)md->ptr), 0 );        //发送数据
                ev.data.fd=sockfd;
                ev.events=EPOLLIN|EPOLLET;
                epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev); //修改标识符,等待下一个循环时接收数据
            }
            else
            {
                //其他的处理
            }
        }
    }

I wrote a few complete server-side examples. The source code is available in the group 973961276. Friends in need can join the group. In the end, it would be great to have a like.

 

Guess you like

Origin blog.csdn.net/linuxguitu/article/details/112984510