Multiplexing in linux

Table of contents

1. Definition and use of the three

 1. select

1.1 Function definition

1.2 Some related content

1.3 Use process pseudocode

 2. poll

2.1 Prototype description of the poll function

2.2 Using process pseudocode

3.epoll

3.1 The use process (because it is a bit complicated, this part is put in front)

3.2 Related functions

3.3 The trigger method provided by epoll

2. Comparative summary


        Recently, after learning multi-process and multi-thread programming, I learned multiplexing select, poll, and epoll, and made a comparison and summary of the three.

1. Definition and use of the three

 1. select

1.1 Function definition

int select(int max_fd, fd_set *readset, fd_set *writeset, fd_set *exceptset, struct timeval *timeout);

Return value and parameter description:

  • Return value : Returns the number of ready descriptors (greater than 0) when normal, returns 0 when timeout occurs, and returns -1 when an error occurs;
  • max_fd : It is the total number of fd to be tested, and the maximum file descriptor to be tested is +1. (Because the file descriptor starts from 0, here max_fd is the total number of fd, and the one with 0 needs to be added). The linux kernel scans file descriptors from 0 to max_fd-1. It should be noted that if the file descriptors to be detected are 8, 17, 50, the linux kernel will actually monitor from 0 to the largest fd, that is, actually monitor 0~50, so max_fd is max(8, 17, 50) +1 ;
  • readset : The set of read condition fds to be tested by the kernel , if no tests are required, it can be set to NULL;
  • writeset : The fd set of writing conditions to be tested by the kernel , if the test is not required, it can be set to NULL;
  • exceptset : The fd set of abnormal conditions to be tested by the kernel , if the test is not required, it can be set to NULL;
  • timeout : Set the timeout period for select blocking, if set to NULL, it will never timeout.

PS:

  1. select monitors and waits for the attributes of multiple file descriptors to change. The monitored attributes are readfds (the file descriptor has data to read), writefds (the file descriptor can be written), and exceptfds (the file descriptor is abnormal).
  2. Calling the select function will block the program here, and the function will not return until a descriptor is ready (data is readable, writable, error exception) or timeout (timeout specified time has passed but no event arrives).
  3. The select() function allows the process to instruct the kernel to wait for any one of multiple events (file descriptors) to occur, and only wake it up when one or more events occur or after a specified period of time, when the select() function returns , you can find out which file descriptors are ready by traversing fdset, and process accordingly.

       In the Linux kernel, there is a parameter _FD_SETSIZE that defines the number of handles of each FD_SET, which also means that the FD_SET used by select is limited, which is why select() can only handle 1024 client connections at the same time by default. ask:

/linux/posix_types.h:

#define _FD_SETSIZE       1024

1.2 Some related content

//Related header files
#include <sys/select.h>
#include <sys/time.h>

struct timeval
{
    long tv_sec;l l seconds
    long tv_usec; l /microseconds
};

//Related functions, that is, the order of use
FD_ZERO(fd_set* fds) //First clear the set
FD_SET(int fd, fd_set* fds) //Add the given descriptor to the set
FD_IsSET(int fd, fd_set* fds) // Determine whether the specified descriptor is in the set

FD_CLR(int fd, fd_set* fds) //Delete the given descriptor from the file


//Call select int select(int max_fd, fd_set *readset, fd_set *writeset, fd_set *exceptset, struct timeval *timeout) in the program after setting ;
 

1.3 Use process pseudocode

#include <sys/select.h>
#include <sys/time.h>

fd_set        rdset; 
...

FD_ZERO(&rdset);  //先清空集合
FD_SET(testfd1,&rdset);  //将给定的描述符加入集合

/* 视情况使用 */ 
FD_CLR(int fd,fd_set* fds);//将给定的描述符从文件中删除

rv = select(maxfd+1, &rdset, NULL, NULL, NULL);  //select进行阻塞监测

if( FD_IsSET(testfd1,&rdset) )//判断出文件描述符,进行相应操作
{
     ...
}
else if( FD_IsSET(testfd2,&rdset) )
{
     ...
}

 2. poll

2.1 Prototype description of the poll function

#include <poll.h>

struct pollfd

{

        int fd; //file descriptor

        short events; //Waiting for events, set by yourself

        short revents; //Events that actually happened, put them here after kernel processing
}; 

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

Parameter Description:

① fds: Points to an array of struct pollfd type, each pollfd structure specifies a monitored file descriptor, instructing poll() to monitor multiple file descriptors.

  • fd: the monitored file descriptor;
  • events field: monitor the event mask of the file descriptor, which is set by the user;
  • revents field: the operation result event mask of the file descriptor, the kernel sets this field when the call returns, and any event requested in the events field may be returned in the revents field. The following table lists some constant values ​​for the specified events flag and the test revents flag.

If you want to listen to multiple events and multiple fds at the same time, you can set fds to an array (struct pollfd fds_array[1024])

② nfds: Specifies the number of elements to monitor in the array.

③ timeout:

  • timeout>0: Specifies the number of milliseconds to wait. Regardless of whether I/O is ready or not, poll will return;
  • timeout<0: infinite timeout, making poll() hang until a specified event occurs;
  • timeout is 0: The poll call returns immediately and lists the file descriptors ready for I/O, but does not wait for other events.

PS:

  1. When the function is successfully called, poll() returns the number of file descriptors whose revents field in the structure is not 0;
  2. If no event occurs before timeout, poll() returns 0;
  3. On failure, poll() returns -1 and sets errno to one of the following values:
  • An invalid file descriptor was specified in one or more EBADF structures.
  • The address pointed to by the EFAULTfds pointer is outside the address space of the process.
  • A signal is generated before the event requested by EINTR and the call can be re-initiated.
  • EINVALnfds parameter exceeds PLIMIT_NOFILE value.
  • ENOMEM Insufficient memory available to complete request.

2.2 Using process pseudocode

#include <poll.h>

#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))

struct pollfd            fds_array[1024];

for( i=0; i<ARRAY_SIZE(fds_array); i++ )  //个人习惯先都设为负数
{
    fds_array[i].fd=-1;
}
fds_array[0].fd = fd1;
fds_array[0].events = POLLIN;  //设置多个可以写为POLLIN|POLLRDNORM

rv = poll(fds_array, max+1, -1);

if (fds_array[0].revents & POLLIN)
{
    ...
}
else
{
    ...
}

3.epoll

3.1 The use process (because it is a bit complicated, this part is put in front)

        epoll is a poll improved by the Linux kernel to handle a large number of file descriptors. It is an enhanced version of the multiplexed IO interface select/poll under Linux. The CPU utilization of the system. Another reason is that when acquiring an event, it does not need to traverse the entire set of monitored descriptors, but only needs to traverse the set of descriptors that are asynchronously awakened by the kernel IO event and added to the Ready queue .

The design and implementation of epoll is completely different from select.

epoll divides the original select/poll call into three parts by applying for a simple file system in the Linux kernel:

  1. Call epoll_create() to create an epoll object (allocate resources for this handle object in the epoll file system);
  2. Call epoll_ctl() to add the 1 million connected sockets to the epoll object;
  3. Call epoll_wait() to collect the connections of the events that occurred.

3.2 Related functions

(1) Create an epoll instance: epoll_create()

#include <sys/epoll.h>

int epoll_create (int size ); //Create a new epoll instance, and its corresponding interest list is initialized to empty

return value:

        Returns the file descriptor if successful, or -1 if an error occurs.

        As a function return value, epoll_create() returns the file descriptor representing the newly created epoll instance. This file descriptor is used in several other epoll system calls to represent the epoll instance. When the file descriptor is no longer needed, it should be closed with close(). When all the file descriptors related to the epoll instance are closed, the instance is destroyed and the related resources are returned to the system.

Parameter Description:

  • size : Specify the number of file descriptors you want to check through the epoll instance. This parameter is not an upper limit, but tells the kernel how to divide the initial size for internal data structures. Since version 2.6.8 of Linux, the size parameter is ignored, and this parameter is equivalent to useless.

PS:

        Since kernel version 2.6.27, Linux supports a new system call epoll_create1() . The task performed by this system call is the same as epoll_create(), but the useless parameter size is removed, and a flags parameter that can be used to modify the behavior of the system call is added. Currently only one flag is supported: EPOLL_CLOEXEC, which causes the kernel to start the execution or shutdown flag on the new file descriptor.

(2) Modify the interest list of epoll: epoll_ctl()

#include  <sys/epoll.h>

int epoll_ctl(int epfd,   int op,   int fd,   struct epoll_event *ev);

//Modify the interest list in the epoll instance represented by the file descriptor epfd

return value:

        Returns 0 on success, -1 on error.

Parameter Description:

epfd : is the return value of epoll_create();

op : Specify the operation to be performed, it can be the following values:

  • EPOLL_CTL_ADD: Add the descriptor fd to the interest list in the epoll instance. For the events we are interested in on fd, they are all specified in the structure pointed to by ev. If we try to add an existing file descriptor to the interest list, epoll_ctl() will throw an EEXIST error;
  • EPOLL_CTL_MOD: To modify the event set on the descriptor, the information in the structure pointed to by ev needs to be used. If we try to modify a file descriptor that is not in the interest list, epoll_ctl() will have an ENOENT error;
  • EPOLL_CTL_DEL: Remove the file descriptor fd from the interest list of epfd. This operation ignores the parameter ev. If we try to remove a file descriptor that is not in epfd's interest list, epoll_ctl() will throw an ENOENT error. Closing a file descriptor will automatically remove it from the interest list of all epoll instances;

fd : Indicates which file descriptor setting in the interest list to modify. This argument can be a file descriptor representing a pipe, FIFO, socket, POSIX message queue, inotify instance, terminal, device, or even another epoll instance. However, here fd cannot be used as a file descriptor for ordinary files or directories ;

ev : Pointer to the structure epoll_event, the definition of the structure is as follows:

        struct epoll_event {

                 uint32_t                 events;      /* epoll events(bit mask) */

                 epoll_data_t              data;      /* User data */

        };

        typedef union epoll_data {

                 void                *ptr;         /* Pointer to user-defind data */

                 int                     fd;         /* File descriptor */

                 uint32_t         u32;         /* 32-bit integer */

                 uint64_t         u64;         /* 64-bit integer */

        } epoll_data_t;

The settings (epoll_event) made by the parameter ev for the file descriptor fd are as follows:

  • The events field: is a bitmask that specifies the set of events we are interested in on the descriptor fd to be checked;
  • data field: It is a union. When the descriptor fd is called ready state later, the members of the union can be used to specify the information passed back to the calling process.

(3) Event waiting: epoll_wait

#include  <sys/epoll.h>

int epoll_wait(int epfd,   struct epoll_event *evlist,   int maxevents,   int timeout);

//The system call epoll_wait() returns the file descriptor information in the ready state in the epoll instance, and a single epoll_wait() call can return the information of multiple ready state file descriptors.

return value:

        After the call is successful, epoll_wait() returns the number of elements in the array evlist;

        Returns 0 if no file descriptors are ready within the timeout interval;

        When an error occurs, return -1 and set the error code in errno to indicate the cause of the error.

Parameter Description:

①  epfd : return value of epoll_create();

②  evlist : The structure array pointed to returns information about the ready state file descriptor, and the caller is responsible for applying for the space of the array evlist;

③  maxevents : Specify the number of elements contained in the evlist array;

④  timeout : Determine the blocking behavior of epoll_wait(), as follows:

  • When timeout is -1, the call will block until an event occurs on a file descriptor in the interest list or until a signal is caught.
  • When timeout is 0, a non-blocking check is performed to see which event occurred on the descriptor in the interest list.
  • timeout>0, the call will block for at most timeout milliseconds until an event occurs on the file descriptor, or until a signal is caught.

        In the array evlist, each element returns the information of a single ready state file descriptor. The events field returns a mask of events that have occurred on this descriptor. The data field returns the value we specified in ev.data when we registered the event of interest with epoll_ctl() on the descriptor. Note that the data field is the only way to know the file descriptor associated with this event. Therefore, when we call epoll_ctl() to add a file descriptor to the list of interest, we should either set ev.date.fd to the file descriptor, or set ev.date.ptr to point to a structure containing the file descriptor body.

3.3 The trigger method provided by epoll

        In addition to providing level trigger (Level Triggered) of IO events such as select/poll, epoll also provides edge trigger (Edge Triggered), which makes it possible for user space programs to cache IO status, reduce epoll_wait/epoll_pwait calls, and improve application performance. program efficiency.

        LT (level triggered level trigger) is the default way of working , and supports both block and no-block socket. In this approach, the kernel tells you whether a file descriptor is ready, and then you can do this to the ready fd I/O operations. If you do nothing, the kernel will continue to notify you. This mode is less likely to cause programming errors, and the traditional select/poll are all representatives of this model.

        ET (edge-triggered) is a high-speed working mode and only supports non-block socket. In this mode, the kernel tells you via epoll when a descriptor becomes ready from not ready. It then assumes that you know the file descriptor is ready, and won't send any more ready notifications for that file descriptor until you do something that causes that file descriptor to no longer be ready (for example, you An EWOULDBLOCK error was caused when sending, receiving, or receiving a request, or sending and receiving less than a certain amount of data). But please note that if you have not performed IO operations on this fd (causing it to become unready again), the kernel will not send more notifications (only once), but in the TCP protocol, the acceleration effect of the ET mode still needs to be updated. Multiple benchmark confirmations.

        The LT event will not be discarded, as long as there is data in the read buffer for the user to read, it will keep notifying you. ET, on the other hand, is only notified when an event occurs. It can be simply understood that LT is a level trigger, while ET is an edge trigger. The LT mode will be triggered as long as there are unprocessed events, while the ET mode will only be triggered when the high-low level changes (that is, the state is from 1 to 0 or 0 to 1).

3.4 Using process pseudocode

#include <sys/epoll.h>

#define MAX_EVENTS 512

/*
struct epoll_event
{
     uint32_t         events; /* epoll events(bit mask) */
     epoll_data_t       data; /* User data */
};

typedef union epoll_data
{    
     void             *ptr; /* Pointer to user-defind data */
     int                fd; /* File descriptor */
     uint32_t          u32; /* 32-bit integer */
     uint64_t          u64; /* 64-bit integer */
} epoll_data_t;
*/


int                         epollfd;  //新创建的epoll实例的文件描述符
struct epoll_event          event;    //感兴趣的事件添加在这个结构体中
struct epoll_event          event_array[MAX_EVENTS];  //epoll_wait()返回有关就绪态文件描述符的信息(上文所谓数组evlist的空间由调用者负责申请指的就是像这里的定义数组)
int                         events;   //存放在调用epoll_wait()后,数组event_array中的元素个数

if( ( epollfd = epoll_create(MAX_EVENTS) ) < 0 )   //创建epoll对象,返回值小于0则失败
{
    ...
}

//event.events  = EPOLLIN|EPOLLET;
  event.events  = EPOLLIN;   //向event中添加感兴趣的事件
  event.data.fd = fd1;       //添加监听的文件描述符

if( epoll_ctl(epollfd, EPOLL_CTL_ADD, fd1, &event) < 0)  //修改epoll的兴趣列表,将fd1添加到epoll实例epollfd中去,返回小于0则失败
{
    ...
}

events = epoll_wait(epollfd, event_array, MAX_EVENTS, -1);  //MAX_EVENTS是event_array数组里包含的元素个数

if( events < 0)  //出错
{
    ...
}
else if( events == 0 )  //超时
{
    ...
}
else  //成功后epoll_wait()返回数组event_array中的元素个数
{
    for(i=0; i<events; i++)
    {
        if ( (event_array[i].events & EPOLLERR) || (event_array[i].events & EPOLLHUP) )  
//出现异常情况,注意相应的关闭文件描述符之类的
        {
             printf("epoll_wait error: %s\n", strerror(errno));
             epoll_ctl(epollfd, EPOLL_CTL_DEL, event_array[i].data.fd, NULL);
             close(event_array[i].data.fd);
        }
        if( event_array[i].data.fd == fd1 )  //判断是哪个文件描述符发生事件
        {
            //做对应的操作

             event.data.fd = connfd;
             //event.events = EPOLLIN|EPOLLET;
             event.events = EPOLLIN;

             if( epoll_ctl(epollfd, EPOLL_CTL_ADD, connfd, &event) < 0 )
             {
                 printf("epoll add failure: %s\n", strerror(errno));
                 close(event_array[i].data.fd);
                 ...
             }
        }
        else if( event_array[i].data.fd == fd2 )
        {
             ...
        }
    }
}



2. Comparative summary

       Before Linux implements the epoll event-driven mechanism, we generally choose to use IO multiplexing methods such as select or poll to implement concurrent service programs. Since epoll was officially introduced into the Linux 2.6 kernel, epoll has become an essential technology for implementing high-performance network servers. In the era when some terms such as big data, high concurrency, and clusters are popular, select and poll are more and more useful. The more limited, the limelight has been taken by epoll.

Disadvantages of select:

  1. There is a maximum limit on the number of file descriptors that a single process can monitor, usually 1024. Of course, the number can be changed, but since select uses polling to scan file descriptors, the more file descriptors there are, the worse the performance will be;
  2. Kernel/user space memory copy problem, select needs to copy a large number of handle data structures, resulting in huge overhead;
  3. Select returns an array containing the entire handle, and the application needs to traverse the entire array to find out which handles have events;
  4. The trigger mode of select is horizontal trigger. If the application does not complete the IO operation on a ready file descriptor, then each subsequent select call will still notify the process of these file descriptors.

        Compared with the select model, poll uses a linked list to save file descriptors, so there is no limit to the number of monitored files, but the other three shortcomings still exist. Take the select model as an example, assuming that our server needs to support 1 million concurrent connections, then if __FD_SETSIZE is 1024, we need to open up at least 1k processes to achieve 1 million concurrent connections. In addition to the time consumption of inter-process context switching, a large number of brainless memory copies, array polling, etc. from the kernel/user space are unbearable for the system. Therefore, for a server program based on the select model, it is a difficult task to achieve 100,000 concurrent accesses.

Guess you like

Origin blog.csdn.net/qq_51368339/article/details/127214344