IO multiplexing

Table of contents

1. select

1.1 Introduction to select

1.2 select function

1.3 scoket ready condition

1.4 select basic workflow

1.5 select server

1.6 Advantages of select

1.7 Disadvantages of select

1.8 Applicable scenarios of select

Two, poll

2.1 poll function

2.2 poll server

2.3 Advantages && Disadvantages of poll

Three, epoll

3.1 Introduction to epoll

3.2 epoll related system calls

3.3 Working principle of epoll

3.4 epoll server

3.5 Advantages of epoll

3.6 How epoll works


1. select

1.1 Introduction to select

select is a multiplex interface provided by the system

  • The select system call allows the program to simultaneously monitor whether events on multiple file descriptors are ready
  • The core work of select is to wait. When one or more events in the monitored file descriptor are ready, select will return successfully and notify the caller of the ready event of the corresponding file descriptor.

1.2 select function

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

Parameter Description:

  • nfds: Among the file descriptors that need to be monitored, the largest file descriptor value + 1
  • readfds: Input and output parameters. When calling, the user tells the kernel which file descriptors need to monitor whether the read events are ready. When returning, the kernel informs the user which file descriptors are ready.
  • writefds: Input and output parameters. When calling, the user tells the kernel which file descriptors need to monitor whether the write events are ready. When returning, the kernel informs the user which file descriptors are ready.
  • exceptfds: Input and output parameters. When calling, the user tells the kernel which file descriptors need to monitor whether the abnormal events are ready. When returning, the kernel informs the user which file descriptors are ready.
  • timeout: input and output parameters, the user sets the waiting time of select when calling, and indicates the remaining time of timeout when returning

The value of the parameter timeout:

  • NULL/nullptr: After the select call, block and wait until an event on a monitored file descriptor is ready
  • 0: Non-blocking wait after select call, no matter whether the event on the monitored file descriptor is ready or not, select will return immediately after detection
  • Specific time value: After select is called, it will block and wait for the specified time. If there is no event ready on the monitored file descriptor, select will time out and return after this time.

Return value description:

  • If the function call is successful, it returns the number of file descriptors that are ready for the event
  • If the timeout time is exhausted, return 0
  • If the function call fails, -1 is returned, and the error code is set

When the select call fails, the error code may be set to:

  • EBADF: The file descriptor is invalid or the file is closed
  • EINTR: This call was interrupted by a signal
  • EINVAL: Parameter nfds is negative
  • ENOMEM: Insufficient core memory

fd_set structure 

The fd_set structure is similar to the sigset_t structure. The fd_set is essentially a bitmap, and the corresponding bit in the bitmap is used to represent the file descriptor to be monitored.

Before calling the select function, you need to use the fd_set structure to define the corresponding file descriptor set, and then add the file descriptors to be monitored to the file descriptor set. The essence of this adding process is to perform bit operations, but this bit operation does not need The user does it by himself, and the system provides a set of special interfaces for performing various operations on the bitmap of type fd_set

void FD_CLR(int fd, fd_set *set);      //用来清除描述词组set中相关fd的位
int  FD_ISSET(int fd, fd_set *set);    //用来测试描述词组set中相关fd的位是否为真
void FD_SET(int fd, fd_set *set);      //用来设置描述词组set中相关fd的位
void FD_ZERO(fd_set *set);             //用来清除描述词组set的全部位

timeval structure

The last parameter timeout passed to the select function is a pointer to the timeval structure. The timeval structure is used to describe the length of a period of time. The structure contains two members, where tv_sec represents seconds and tv_usec represents microseconds

1.3 scoket ready condition

read ready

  • In the socket kernel, the number of bytes in the receiving buffer is greater than or equal to the low water mark SO_RCVLOWAT . At this time, the file descriptor can be read without blocking, and the return value is greater than 0
  • In socket TCP communication, if the peer end closes the connection, if the socket is read at this time, it will return 0
  • Listen for new connection requests on the socket
  • Unhandled error on socket

write ready

  • In the socket kernel, the number of available bytes in the sending buffer is greater than or equal to the low water mark SO_SNDLOWAT , and non-blocking writing is possible at this time, and the return value is greater than 0
  • The write operation of the socket is closed (close or shutdown), and the write operation at this time will trigger the SIGPIPE signal
  • After the socket uses non-blocking connect to connect successfully or fails
  • unread error on socket

abnormally ready

  • Received out-of-band data on the socket

Note: The out-of-band data is related to the emergency mode of TCP. The URG flag in the TCP header and the 16-bit urgent pointer can be used together to send/receive out-of-band data

1.4 select basic workflow

To implement a simple select server, all the server needs to do is to read the data sent by the client and print it, then the workflow of the select server is as follows:

  • Initialize the server first, complete the socket creation, binding and listening
  • Define a _fd_array array to save the listening socket and the socket that has established a connection with the client. The listening socket can be added to the _fd_array array during initialization
  • Then the server starts to call the select function in a loop to detect whether the read event is ready, and if it is ready, perform the corresponding operation
  • Before calling the select function each time, you need to define a read file descriptor set readfds , and set the file descriptors in _fd_array into readfds in turn , indicating that select monitors whether the read events of these file descriptors are ready
  • When select detects that the data is ready, it will set the file descriptors ready for read events into readfds . At this time, you can know which file descriptors are ready for read events, and perform corresponding operations on these file descriptors.
  • If the listener socket is ready to read the event, call the accept function to obtain the established connection from the underlying full connection queue, and add the socket corresponding to the connection to the _fd_array array
  • If the socket that is ready to read the event is a socket that establishes a connection with the client, call the read function to read the data sent by the client and print it out
  • The socket read event of the connection between the server and the client is ready, or the client may close the connection. At this time, the server should call close to close the socket and clear the socket from the _fd_array array . No need Then monitor the read event of the file descriptor

Notice:

  • The readfds, writefds and exceptfds passed to the select function are all input and output parameters. When the select function returns, the values ​​in these parameters have been modified, so they need to be reset every time the select function is called, and the timeout is the same
  • Because readfds needs to be reset before each call to the select function, it is necessary to define a _fd_array array to save several connections and listening sockets that have been established with the client. The actual file descriptor in the _fd_array array needs to be selected File descriptor to watch for read events
  • The select server only reads the data sent by the client, so you only need to let select monitor the read event of a specific file descriptor. If you want select to monitor the read event and write event of a specific file descriptor at the same time, you need to define readfds and writefds respectively , and define two arrays to save the file descriptors that need to be monitored for read events and write events, so as to reset readfds and writefds before calling the select function each time
  • Since the maximum file descriptor value +1 in the monitored file descriptor needs to be passed in when selecting is called, it is also necessary to record the maximum file descriptor value each time traversing _fd_array to reset readfds

1.5 select server

Socket class

Write a Socket class to encapsulate the socket-related interfaces to a certain extent. In order to allow the outside world to directly call the functions encapsulated in the Socket class, some functions are defined as static member functions

//网络套接字封装
#pragma once
#include <iostream>
#include <string>
#include <cstring>
#include <cerrno>
#include <cassert>
#include <unistd.h>
#include <memory>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include "Log.hpp"

class Socket
{
    const static int gbacklog = 15;
public://服务端客户端通用
    static int SocketCreate() {
        int SocketFd = socket(AF_INET, SOCK_STREAM, 0);
        if(SocketFd < 0) {
            LogMessage(FATAL, "socket create fail, %d:%s", errno, strerror(errno));
            exit(1);
        }
        LogMessage(NORMAL, "socket create success, SocketFd:%d", SocketFd);
        return SocketFd;
    }

public://服务端专用
    static void Bind(int listenSocketFd, uint16_t serverPort, std::string serverIp = "0.0.0.0") {
        struct sockaddr_in local;
        memset(&local, '\0', sizeof local);
        local.sin_family = AF_INET;
        local.sin_port = htons(serverPort);
        inet_pton(AF_INET, serverIp.c_str(), &local.sin_addr);
        if(bind(listenSocketFd, (struct sockaddr*)&local, sizeof local) < 0) {
            LogMessage(FATAL, "bind fail, %d:%s", errno, strerror(errno));
            exit(2);
        }
        LogMessage(NORMAL, "bind success, serverPort:%d", serverPort);
    }

    static void Listen(int listenSocketFd) {
        if(listen(listenSocketFd, gbacklog) < 0) {
            LogMessage(FATAL, "listen fail, %d:%s", errno, strerror(errno));
            exit(3);
        }
        LogMessage(NORMAL, "listen success");
    }

    static int Accept(int listenSocketFd, std::string* clientIp, uint16_t* clientPort) {
        struct sockaddr_in client;
        socklen_t length = sizeof client;
        int serviceSocketFd = accept(listenSocketFd, (struct sockaddr*)&client, &length);
        if(serviceSocketFd < 0) {
            LogMessage(ERROR, "accept fail, %d:%s", errno, strerror(errno));
            exit(4);
        }
        if(clientIp != nullptr) *clientIp = inet_ntoa(client.sin_addr);
        if(clientPort != nullptr) *clientPort = ntohs(client.sin_port);
        return serviceSocketFd;
    }

public://客户端专用
    bool Connect(int clientSocketFd, std::string& serverIp, uint16_t& serverPort) {
        struct sockaddr_in server;
        server.sin_family = AF_INET;
        server.sin_addr.s_addr = inet_addr(serverIp.c_str());
        server.sin_port = htons(serverPort);
        if(connect(clientSocketFd, (struct sockaddr*)&server, sizeof server) == 0) return true;
        else return false;
    }

public:
    Socket() {}
    ~Socket() {}
};

SelectServer class

#ifndef __SELECT_SVR_H__
#define __SELECT_SVR_H__

#include <iostream>
#include <string>
#include <sys/select.h>
#include "Socket.hpp"
#include "Log.hpp"
#include <unistd.h>
#include <cstring>
#include <cerrno>
#include <sys/time.h>
using namespace std;

#define BITS 8
#define NUM (sizeof(fd_set) * BITS)
#define FD_NONE -1

// 只完成读取,写入和异常不做处理
class SelectServer
{
public:
    SelectServer(const uint16_t &port = 9090) : _port(port)
    {
        _listenSocketFd = Socket::SocketCreate();
        Socket::Bind(_listenSocketFd, _port);
        Socket::Listen(_listenSocketFd);
        LogMessage(DEBUG, "create base socket success");

        _fd_array[0] = _listenSocketFd;
        for (int i = 1; i < NUM; ++i)
            _fd_array[i] = FD_NONE;
    }
    ~SelectServer()
    {
        if (_listenSocketFd > 0)
            close(_listenSocketFd);
    }

public:
    void Start()
    {
        while (true)
        {
            DebugPrint();

            fd_set readfds;
            FD_ZERO(&readfds);
            int maxFd = _listenSocketFd;
            for (int i = 0; i < NUM; ++i)
            {
                if (_fd_array[i] == FD_NONE)
                    continue;
                else
                    FD_SET(_fd_array[i], &readfds);
                if (maxFd < _fd_array[i]) maxFd = _fd_array[i];
            }

            int number = select(maxFd + 1, &readfds, nullptr, nullptr, nullptr);
            switch (number)
            {
            case 0:
                LogMessage(DEBUG, "%s", "Time Out ...");
                break;
            case -1:
                LogMessage(WARNING, "Select Fail: %d : %s", errno, strerror(errno));
                break;
            default:
                LogMessage(DEBUG, "Get a event");
                HandlerEvent(readfds);
                break;
            }
        }
    }

private:
    void Accepter()
    {
        string clientIp;
        uint16_t clientPort = 0;
        int socketfd = Socket::Accept(_listenSocketFd, &clientIp, &clientPort);
        if (socketfd < 0)
        {
            LogMessage(ERROR, "accept error");
            return;
        }
        LogMessage(DEBUG, "Get a link success : [%s : %d] , socketFd : %d", clientIp.c_str(), clientPort, socketfd);

        int pos = 1;
        for (; pos < NUM; ++pos)
            if (_fd_array[pos] == FD_NONE) break;
        if (pos == NUM) { // 满了
            LogMessage(ERROR, "%s:%d", "SelectServer already full, close:", socketfd);
            close(socketfd);
        }
        else { // 找到空位置
            _fd_array[pos] = socketfd;
        }
    }

    void Recver(int i) 
    {
        LogMessage(DEBUG, "message in , get IO event:%d", _fd_array[i]);
        char buffer[1024];
        int num = recv(_fd_array[i], buffer, sizeof(buffer) - 1, 0);
        if(num > 0) {
            buffer[num] = 0;
            LogMessage(DEBUG, "client[%d]#%s", _fd_array[i], buffer);
        }
        else if(num == 0) {
            LogMessage(DEBUG, "client[%d] link close, me too...", _fd_array[i]);
            close(_fd_array[i]);
            _fd_array[i] = FD_NONE;
        }
        else {
            LogMessage(WARNING, "%d recv error, %d : %s", _fd_array[i], errno, strerror(errno));
            close(_fd_array[i]);
            _fd_array[i] = FD_NONE;
        }
    }

    void HandlerEvent(const fd_set &readfds)
    {
        for (int i = 0; i < NUM; ++i)
        {
            // 去掉不合法的fd
            if (_fd_array[i] == FD_NONE) continue;
            // 判断是否就绪
            if (FD_ISSET(_fd_array[i], &readfds))
            {
                if (i == 0 && _fd_array[i] == _listenSocketFd) Accepter(); //链接事件
                else  Recver(i);// 读事件
            }
        }
    }

    void DebugPrint()
    {
        cout << "_fd_array[]:";
        for (int i = 0; i < NUM; ++i) {
            if (_fd_array[i] != FD_NONE) cout << _fd_array[i] << " ";
        }
        cout << endl;
    }

private:
    uint16_t _port;
    int _listenSocketFd;
    int _fd_array[NUM];
};

#endif
  • After calling the accept function to obtain the connection from the bottom layer, the read function cannot be called immediately to read the data in the connection, because the data in the new connection may not be ready at this time, if the read function is called directly, it may block, and the waiting process should be stopped It is handed over to the select function to complete, so after obtaining the connection, directly add the file descriptor corresponding to the connection to the _fd_array array, and then read the data when the read event of the connection is ready
  • Adding a file descriptor to the fd_array array is essentially traversing the fd_array array and finding an unused location to add the file descriptor to it. But it is possible that all the positions in the _fd_array array have been occupied, then the file descriptor will fail to be added. At this time, the socket corresponding to the connection just obtained can only be closed, because the server has no ability at this time. handle this connection

select server test

Use the telnet tool to connect to the server. At this time, the data sent to the server through telnet can be read by the server and printed out.

Although SelectServer is only a single-process, single-threaded server, it can provide services for multiple clients at the same time, because after the select function is called, it will inform the select server which client's corresponding connection event is ready. At this time, the select server can read Get the data sent by the corresponding client, and call the select function after reading to wait for the read event of a client connection to be ready

When the server detects that the client exits, it will also close the corresponding connection and clear the corresponding socket from the _fd_array array

existing problems

  • If the select server wants to send data to the client, it cannot directly call the write function, because when calling the write function, it is actually divided into two steps: "waiting" and "copying". The process of "waiting" should also be handed over to the select function, so Before calling the select function each time, in addition to resetting the readfds, you need to reconfigure the writefds, and you also need an array to save the file descriptors that need to be monitored to see if the write event is ready. When the write event of a certain file descriptor is ready Can call the write function to send data to the client
  • There are no custom agreements. When reading data in the code, it does not read according to certain rules, which may cause the problem of sticky packets. The root cause is that there is no custom protocol. For example, the HTTP protocol stipulates that reading a blank line when reading the underlying data indicates that an HTTP header has been read. At this time, the length of the text can be obtained according to the Content-Length attribute in the HTTP header, and finally a complete HTTP header can be read. HTTP message, the HTTP protocol avoids the problem of sticky packets in this way
  • There is no corresponding input and output buffer. The code directly stores the read data in the character array buffer, which is imprecise, because this data reading may not read a complete message, and the server cannot analyze and process the data at this time. The read data should be stored in an input buffer, and the server will process it after reading a complete message. In addition, if the server wants to be able to respond to the client, the server's response data should not be sent to the client directly by calling the write function, but should be stored in an output buffer first, because the response data may be too large to be sent all at once , may need to send in batches

To sum up, the SelectServer in this blog is just a demo to understand the use of the select function

1.6 Advantages of select

  • It can wait for multiple file descriptors at the same time, and is only responsible for waiting. The actual IO operations are completed by interfaces such as accept, read, and write, ensuring that the interfaces will not be blocked during IO operations
  • Select waits for multiple file descriptors at the same time, so the "waiting" time can be overlapped to improve IO efficiency

The above advantages are also the advantages of all multi-way interface

1.7 Disadvantages of select

  • Every time you call select, you need to manually set the fd collection, which is also very inconvenient from the perspective of interface usage
  • Every time select is called, the fd collection needs to be copied from the user state to the kernel state. This overhead will be very large when there are many fds
  • At the same time, every time you call select, you need to traverse all the fds passed in in the kernel. This overhead is also very large when there are many fds.
  • The number of file descriptors that can be monitored by select is too small

Select the number of file descriptors that can be monitored

The readfds, writefds, and exceptfds passed in when the select function is called are all fd_set structures. The fd_set structure is essentially a bitmap, and a bit is used to mark a file descriptor. Therefore, the number of file descriptors that can be monitored by select depends on the type of fd_set The number of bits

#include <iostream>
#include <sys/types.h>
using namespace std;

int main()
{
	cout << sizeof(fd_set)* 8 << endl;//1字节 8bit位
	return 0;
}

After running the code, it can be found that the number of file descriptors that can be monitored by select is 1024 

The number of file descriptors that a process can open

There is a files pointer in the process control block task_struct, which points to a struct files_struct structure, and the file descriptor table fd_array of the process is stored in this structure, where the size of the file descriptor table fd_array is defined as NR_OPEN_DEFAULT, and the value of NR_OPEN_DEFAULT is actually 32

But it does not mean that a process can only open up to 32 file descriptors. The number of file descriptors that can be opened by a process can be expanded. You can see the upper limit of file descriptors that a process can open through the ulimit -a command

The number of file descriptors that can be monitored by select is 1024. If the listening socket is excluded, only 1023 clients can be connected at most.

1.8 Applicable scenarios of select

The multi-channel transfer interface select, poll and epoll need to be used in certain scenarios. If the scenario is not suitable, it may be counterproductive

  • The multi-channel transfer interface is generally applicable to multiple connections, and only a small number of connections in the multiple connections are active. Because a small number of connections are relatively active, it also means that almost all connections need to spend a lot of time waiting for the event to be ready when performing IO operations. At this time, using the multi-channel transfer interface can overlap these events and improve IO efficiency. .
  • For scenarios where most of the connections in the multi-connection are active, it is not suitable to use multi-way transfer. Because each connection is very active, it also means that the events on each connection are basically ready at any time. At this time, there is no need to use the multi-channel transfer interface to wait. After all, using the multi-channel transfer interface also costs system time and space resources

Only a small number of multiple connections are relatively active, such as chat tools. After logging in to QQ, there is actually no chatting for most of the time. At this time, it is impossible for the server to call a read function to block and wait for the read event to be ready.

Most of the multi-connections are very active. For example, when performing data backup in an enterprise, the two servers are constantly exchanging data. At this time, the connection is particularly active, and there is almost no need to wait, so there is no need to use multi-channel transfer port

Two, poll

2.1 poll function

int poll(struct pollfd *fds, nfds_t nfds, int timeout);

Parameter Description:

  • fds: a structure list monitored by the poll function, each element contains three parts: file descriptor, monitored event collection, ready event collection
  • nfds: Indicates the length of the fds array
  • timeout: Indicates the timeout period of the poll function, in milliseconds (ms)

The value of the parameter timeout:

  • -1: Block and wait after the poll call until an event on a monitored file descriptor is ready
  • 0: Non-blocking wait after poll is called, no matter whether the event on the monitored file descriptor is ready or not, poll will return immediately after detection
  • Specific time value: After poll is called, block and wait within the specified time. If there is no event ready on the monitored file descriptor, poll will timeout and return after this time

Return value description:

  • If the function call is successful, it returns the number of file descriptors that are ready for events
  • If the timeout time is exhausted, return 0
  • If the function call fails, -1 is returned, and the error code is set

When the poll call fails, the error code may be set to:

  • EFAULT: The fds array is not contained in the address space of the calling program
  • EINTR: This call was interrupted by a signal
  • EINVAL: nfds value exceeds RLIMIT_NOFILE value
  • ENOMEM: Insufficient core memory

struct pollfd structure

  • fd: specific file descriptor, if set to a negative value, the events field is ignored and the revents field returns 0
  • events: Which events on this file descriptor need to be monitored
  • revents: When the poll function returns, it informs the user which events on the file descriptor are ready

The values ​​of events and revents:

These values ​​are defined in the form of macros. There is only one bit in the binary sequence that is 1, and the bits that are 1 are different.

  • Before calling the poll function, the event to be monitored can be added to the events member through the OR operator
  • After the poll function returns, you can use the AND operator to check whether the revents member contains a specific event to know whether the specific event corresponding to the file descriptor is ready

2.2 poll server

The workflow of poll is basically similar to that of select. A simple poll server is also implemented below, which only reads the data sent by the client and prints it

PollServer class

#ifndef __POLL_SVR_H__
#define __POLL_SVR_H__

#include <iostream>
#include <string>
#include <poll.h>
#include "Socket.hpp"
#include "Log.hpp"
#include <unistd.h>
#include <cstring>
#include <cerrno>
using namespace std;

#define FD_NONE -1

// 只完成读取,写入和异常不做处理
class PollServer
{
public:
    PollServer(const nfds_t nfds, const uint16_t &port = 9090) : _port(port), _nfds(nfds), _fds(nullptr)
    {
        _listenSocketFd = Socket::SocketCreate();
        Socket::Bind(_listenSocketFd, _port);
        Socket::Listen(_listenSocketFd);
        LogMessage(DEBUG, "create base socket success");

        _fds = new struct pollfd[_nfds];
        _fds[0].fd = _listenSocketFd;
        _fds[0].events = POLLIN;
        for(int i = 1; i < _nfds; ++i) {
            _fds[i].fd = FD_NONE;
            _fds[i].events = _fds[i].revents = 0;
        }
        _timeout = 1000;
    }
    ~PollServer() { 
        if (_listenSocketFd > 0) close(_listenSocketFd); 
        if (_fds != nullptr) delete[] _fds;
    }

public:
    void Start()
    {
        while (true)
        {
            DebugPrint();
            int number = poll(_fds, _nfds, _timeout);
            switch (number)
            {
            case 0:
                LogMessage(DEBUG, "%s", "Time Out ...");
                break;
            case -1:
                LogMessage(WARNING, "Poll Fail: %d : %s", errno, strerror(errno));
                break;
            default:
                HandlerEvent();
                break;
            }
        }
    }

private:
    void Accepter()
    {
        string clientIp;
        uint16_t clientPort = 0;
        int socketfd = Socket::Accept(_listenSocketFd, &clientIp, &clientPort);
        if (socketfd < 0)
        {
            LogMessage(ERROR, "accept error");
            return;
        }
        LogMessage(DEBUG, "Get a link success : [%s : %d] , socketFd : %d", clientIp.c_str(), clientPort, socketfd);

        int pos = 1;
        for (; pos < _nfds; ++pos)
            if (_fds[pos].fd == FD_NONE) break;
        if (pos == _nfds) { // 满了
            //可以进行自动扩容
            LogMessage(ERROR, "%s:%d", "PollServer already full, close:", socketfd);
            close(socketfd);
        }
        else { // 找到空位置
            _fds[pos].fd = socketfd;
            _fds[pos].events = POLLIN;
        }
    }

    void Recver(int i) 
    {
        LogMessage(DEBUG, "message in , get IO event:%d", _fds[i].fd);
        char buffer[1024];
        int num = recv(_fds[i].fd, buffer, sizeof(buffer) - 1, 0);
        if(num > 0) {
            buffer[num] = 0;
            LogMessage(DEBUG, "client[%d]#%s", _fds[i].fd, buffer);
        }
        else if(num == 0) {
            LogMessage(DEBUG, "client[%d] link close, me too...", _fds[i].fd);
            close(_fds[i].fd);
            _fds[i].fd = FD_NONE;
            _fds[i].events = _fds[i].revents = 0;
        }
        else {
            LogMessage(WARNING, "%d recv error, %d : %s", _fds[i].fd, errno, strerror(errno));
            close(_fds[i].fd);
            _fds[i].fd = FD_NONE;
            _fds[i].events = _fds[i].revents = 0;
        }
    }

    void HandlerEvent()
    {
        for (int i = 0; i < _nfds; ++i)
        {
            // 去掉不合法的fd
            if (_fds[i].fd == FD_NONE) continue;
            // 判断是否就绪
            if (_fds[i].revents & POLLIN)
            {
                if (_fds[i].fd == _listenSocketFd) Accepter(); //链接事件
                else  Recver(i);// 读事件
            }
        }
    }

    void DebugPrint()
    {
        cout << "fds[]:";
        for(int i = 0; i < _nfds; ++i) {
            if(_fds[i].fd == FD_NONE) continue;
            cout << _fds[i].fd << " ";
        }
        cout << endl;
    }

private:
    uint16_t _port;
    int _listenSocketFd;
    struct pollfd* _fds; 
    nfds_t _nfds = 100;
    int _timeout;
};

#endif

The size of the _fds array is fixed. Therefore, when adding the file descriptor corresponding to the newly acquired connection to the fds array, the addition may fail because the fds array is full. At this time, the poll server can only correspond to the newly acquired connection. The socket is closed

poll server test

When calling the poll function, set the value of timeout to 1000, so after running the server, there is no connection request from the client every 1000 milliseconds, then the server will timeout and return

After using the telnet tool to connect to the poll server, the poll function will call accept to obtain the established connection after detecting that the read event of the listening socket is ready, and print out information such as the client's IP and port number. At this time, the client sends The data can also be successfully received by the poll server and printed out

The poll server is also a single-process, single-threaded server, which can also serve multiple clients

When the server detects that the client exits, it will also close the corresponding connection and clear the corresponding socket from the _fds array

2.3 Advantages && Disadvantages of poll

advantage

  • The struct pollfd structure contains events and revents, which is equivalent to separating the input and output parameters of select, so before calling poll each time, there is no need to re-set the parameters like select
  • There is no limit to the number of file descriptors that poll can monitor
  • poll can also wait for multiple file descriptors at the same time, improving IO efficiency

Explain:

  • Although the number of elements of the _fds array is defined as 100 in the code, the size of the _fds array can be increased. How many file descriptors the poll function can monitor is determined by the second parameter of the poll function.
  • The fd_set type has only 1024 bits, so the select function can only monitor up to 1024 file descriptors

shortcoming

  • Like the select function, when poll returns, you need to traverse the _fds array to get ready file descriptors
  • Every time poll is called, a large number of struct pollfd structures need to be copied from user mode to kernel mode. This overhead will increase as the number of file descriptors monitored by poll increases.
  • At the same time, every call to poll needs to traverse all the fds passed in in the kernel. This overhead is also very large when there are many fds.

Three, epoll

3.1 Introduction to epoll

epoll is a multi-channel transfer interface provided by the system

  • The epoll system call can also allow the program to monitor whether the events on multiple file descriptors are ready at the same time, which is the same as select and poll, and the applicable scenarios are also the same
  • epoll has one more e in naming than poll, which can be understood as extend. epoll is an improved poll to handle a large number of file descriptors at the same time.
  • Introduced in the 2.5.44 kernel, epoll has almost all the advantages of select and poll, and is recognized as the best multi-channel I/O readiness notification method under Linux2.6

3.2 epoll related system calls

epoll_create function

int epoll_create(int size);
  • Parameter size: Since Linux2.6.8, the size parameter is ignored, but the value of size must be set to a value greater than 0
  • Return value: the epoll model is successfully created and returns its corresponding file descriptor, otherwise -1 is returned, and the error code is set at the same time

Note: When no longer used, the close function must be called to close the file descriptor corresponding to the epoll model. When all file descriptors referencing the epoll instance are closed, the kernel will destroy the instance and release related resources

epoll_ctl function

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

Parameter Description:

  • epfd: the return value of the epoll_create function (epoll handle)
  • op: Indicates a specific action, represented by three macros
  • fd: the file descriptor that needs to be monitored
  • event: which events on this file descriptor need to be monitored

The second parameter op has the following three values:

  • EPOLL_CTL_ADD: Register a new file descriptor to the specified epoll model
  • EPOLL_CTL_MOD: Modify the listening event of the registered file descriptor
  • EPOLL_CTL_DEL: delete the specified file descriptor from the epoll model

Return value: the function call returns 0 successfully, the call fails and returns -1, and the error code will be set at the same time

The structure of struct epoll_event corresponding to the fourth parameter is as follows:

There are two members in the struct epoll_event structure. The first member events indicates the event to be monitored, and the second member data is a union structure. Generally, fd in this structure is selected to indicate the file descriptor that needs to be monitored.

Common values ​​of events are as follows:

  • EPOLLIN: Indicates that the corresponding file descriptor can be read (including the normal closing of the peer SOCKET)
  • EPOLLOUT: Indicates that the corresponding file descriptor can be written
  • EPOLLPRI: Indicates that the corresponding file descriptor has urgent data readable (indicating that there is out-of-band data coming)
  • EPOLLERR: Indicates that the corresponding file descriptor is sent incorrectly
  • EPOLLHUP: Indicates that the corresponding file descriptor is hung up, that is, the peer end closes the file descriptor
  • EPOLLET: Set the working mode of epoll to Edge Triggered (Edge Triggered) mode
  • EPOLLONESHOT: Only listen to one event. After listening to this event, if you need to continue to monitor the file descriptor, you need to add the file descriptor to the epoll model again

These values ​​are defined in the form of macros. There is only one bit in the binary sequence that is 1, and the bits that are 1 are different.

epoll_wait function

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Parameter Description:

  • epfd: the return value of the epoll_create function (epoll handle), used to specify the epoll model
  • events: The kernel will copy ready events to the events array (cannot be a null pointer, the kernel is only responsible for copying ready events to this array, and will not allocate memory in user mode)
  • maxevents: the number of elements in the events array, this value cannot be greater than the size value passed in when creating the epoll model
  • timeout: Indicates the timeout period of the epoll_wait function, in milliseconds (ms)

The value of the parameter timeout:

  • -1: After calling epoll_wait, block and wait until an event on a monitored file descriptor is ready
  • 0: Non-blocking wait after epoll_wait call, no matter whether the event on the monitored file descriptor is ready or not, epoll_wait will return immediately after detection
  • Specific time value: After epoll_wait is called, block and wait for a specific time. If there is no event ready on the monitored file descriptor, epoll_wait will return after the timeout

return value:

  • If the function call is successful, it returns the number of file descriptors that are ready for events
  • If the timeout time is exhausted, return 0
  • If the function call fails, -1 will be returned, and the error code will be set

When the epoll_wait call fails, the error code may be set to:

  • EBADF: The file descriptor corresponding to the incoming epoll model is invalid
  • EFAULT: The array space pointed to by events cannot be accessed with write permission
  • EINTR: This call was interrupted by a signal
  • EINVAL: epfd is not a file descriptor corresponding to the epoll model, or the value of maxevents passed in is less than or equal to 0

3.3 Working principle of epoll

Red-black tree && ready queue

When a process calls the epoll_create function, the Linux kernel will create an eventpoll structure, that is, the epoll model. The members rbr and rdlist in the eventpoll structure are closely related to the use of epoll

struct eventpoll{
	...
	//红黑树的根节点,这棵树中存储着所有添加到epoll中的需要监视的事件
	struct rb_root rbr;
	//就绪队列中则存放着将要通过epoll_wait返回给用户的满足条件的事件
	struct list_head rdlist;
	...
}
  • The essence of the red-black tree in the epoll model is to tell the kernel which events on which file descriptors need to be monitored. Calling the epll_ctl function is to add, delete, and modify this red-black tree.
  • The essence of the ready queue in the epoll model is to tell the kernel which events on which file descriptors are ready, and calling the epoll_wait function is to get ready events from the ready queue

In epoll, there is a corresponding epitem structure for each event. The nodes in the red-black tree and the ready queue are based on the rbn member and rdllink member in the epitem structure respectively. The member ffd in the epitem structure records the specified The value of the file descriptor, the event member records the event corresponding to the file descriptor

struct epitem{
	struct rb_node rbn; //红黑树节点
	struct list_head rdllink; //双向链表节点
	struct epoll_filefd ffd; //事件句柄信息
	struct eventpoll *ep; //指向其所属的eventpoll对象
	struct epoll_event event; //期待发生的事件类型
}
  • For the rbn member in the epitem structure, the meaning of ffd and event is: it is necessary to monitor whether the event event on ffd is ready
  • For the rdlink member in the epitem structure, the meaning of ffd and event is: the event event on ffd is ready

Notice:

  • A red-black tree is a binary search tree, which must have a key value, and the file descriptor can naturally be used as a red-black tree key value
  • When calling epoll_ctl to add new nodes to the red-black tree, if the EPOLLONESHOT option is set, after listening to this event, if you need to continue to monitor the file descriptor, you need to add it to the epoll model again. The essence is that when you set When an event for the EPOLLONESHOT option is ready, it is automatically removed from the red-black tree by the operating system
  • If EPOLLONESHOT is not set when epoll_ctl is called to add a node to the red-black tree, then the node will always exist after being inserted into the red-black tree, unless the user calls epoll_ctl to delete the node from the red-black tree

callback mechanism

All events added to the red-black tree establish a callback method with the device (network card) driver, which is called ep_poll_callback in the kernel

  • For select and poll, when the operating system monitors whether the events on multiple file descriptors are ready, it needs to let the operating system actively poll and detect these multiple file descriptors, which will increase the burden on the operating system
  • For epoll, the operating system does not need to actively detect events. When the events monitored in the red-black tree are ready, it will automatically call the corresponding callback method and add the ready events to the ready queue.
  • When the user calls the epoll_wait function to obtain the ready event, it only needs to pay attention to whether the underlying ready queue is empty, and if it is not empty, copy the ready event in the ready queue to the user
  • The biggest advantage of using the callback mechanism: no longer need the operating system to actively detect the ready event, when the event is ready, it will automatically call the corresponding callback function for processing

Notice:

  • Only events added to the red-black tree will establish a callback method with the bottom layer, so only when the corresponding event in the red-black tree is ready, the corresponding callback method will be executed and added to the ready queue
  • When there are constantly monitored events ready, the callback method will be called continuously to insert nodes into the ready queue, and the upper layer will also continuously call the epoll_wait function to obtain nodes from the ready queue, which is a typical producer-consumer model
  • Since the ready queue may be accessed by multiple execution streams at the same time, it must be protected by a mutex. The lock and mtx in the eventpoll structure are used to protect critical resources, so epoll itself is thread-safe
  • The wq (wait queue) in the eventpoll structure is the waiting queue. When multiple execution streams want to access the same epoll model at the same time, they need to wait under the waiting queue

3.4 epoll server

Epoll class

Encapsulate epoll-related system calls for subsequent use

#include <iostream>
#include <sys/epoll.h>
#include <cstdlib>
using namespace std;

class Epoll
{
public:
    static const int gsize = 256; 
public:
    static int EpollCreate() 
    {
        int epollFd = epoll_create(gsize);
        if(epollFd > 0) return epollFd;
        exit(5);//创建失败直接终止
    }

    static bool EpollCtl(int epollFd, int op, int socketFd, uint32_t events) 
    {
        struct epoll_event ev;
        ev.events = events;
        ev.data.fd = socketFd;
        int num = epoll_ctl(epollFd, op, socketFd, &ev);
        return num == 0;
    }

    static int EpollWait(int epollFd, struct epoll_event* revs, int num, int timeout) {
        return epoll_wait(epollFd, revs, num, timeout);
    }
};

EpollServer class

#ifndef __EPOLL_SERVER_HPP__
#define __EPOLL_SERVER_HPP__

#include <iostream>
#include <string>
#include <functional>
#include <cassert>
#include <unistd.h>
#include "Epoll.hpp"
#include "Log.hpp"
#include "Socket.hpp"
using namespace std;

namespace ns_epoll
{
    class EpollServer
    {
        using func_t = function<void(string)>;
    public:
        EpollServer(func_t handler, const uint16_t& port = 9090):_port(port), _revsNum(64),_handlerRequest(handler) {
            //申请空间
            _revs = new struct epoll_event[_revsNum];
            //创建监听套接字
            _listenSocketFd = Socket::SocketCreate();
            Socket::Bind(_listenSocketFd, _port);
            Socket::Listen(_listenSocketFd);
            //创建epoll模型
            _epollFd = Epoll::EpollCreate();
            LogMessage(DEBUG, "init success, listenSocketFd : %d, epollFd : %d", _listenSocketFd, _epollFd);
            //将监听套接字添加到epoll中
            if(Epoll::EpollCtl(_epollFd, EPOLL_CTL_ADD, _listenSocketFd, EPOLLIN))
                LogMessage(DEBUG, "Add listenSocketFd to epoll success");
            else exit(6);
        }
        ~EpollServer() {
            if(_listenSocketFd >= 0) close(_listenSocketFd);
            if( _epollFd >= 0) close(_epollFd);
            if(_revs != nullptr) delete[] _revs;
        }
    public:
        void Start()
        {
            int timeout = -1;
            while(true) 
            {
                LoopOnce(timeout);
            }
        }
    public:
        void LoopOnce(int timeout) {
            int num = Epoll::EpollWait(_epollFd, _revs, _revsNum, timeout);
            switch (num)
            {
            case 0:
                LogMessage(DEBUG, "Time Out...");
                break;
            case -1:
                LogMessage(WARNING, "epoll wait error: %s", strerror(errno));
                break;
            default:
                LogMessage(DEBUG, "Get a event");
                HandlerEvents(num);
                break;
            }
        }

        void HandlerEvents(int number)
        {
            assert(number);
            for(int i = 0; i < number; ++i) 
            {
                uint32_t revent = _revs[i].events;
                int socketFd = _revs[i].data.fd;
                if(revent & EPOLLIN) //读事件就绪
                {
                    if(socketFd == _listenSocketFd) Accetper(_listenSocketFd);
                    else Recver(socketFd);
                }
            }
        }

        void Accetper(int listenSocketFd) 
        {
            string clientIp;
            uint16_t clientPort;
            int socketFd = Socket::Accept(listenSocketFd, &clientIp, &clientPort);
            if(socketFd < 0) {
                LogMessage(WARNING, "Accept error");
                return;
            }
            if(!Epoll::EpollCtl(_epollFd, EPOLL_CTL_ADD, socketFd, EPOLLIN)) return;
            LogMessage(DEBUG, "Add new link : %d to epoll success", socketFd);
        }

        void Recver(int socketFd)
        {
            char buffer[10240];
            ssize_t n = recv(socketFd, buffer, sizeof(buffer) - 1, 0);
            if(n > 0) {
                buffer[n] = 0;
                _handlerRequest(buffer);
            }
            else if(n == 0) {
                LogMessage(NORMAL, "client %d close link, me too...", socketFd);
                bool ret = Epoll::EpollCtl(_epollFd, EPOLL_CTL_DEL, socketFd, 0);
                assert(ret);
                close(socketFd);
            }
            else {
                LogMessage(NORMAL, "client %d recv error, close error socketFd", socketFd);
                bool ret = Epoll::EpollCtl(_epollFd, EPOLL_CTL_DEL, socketFd, 0);
                assert(ret);
                close(socketFd);
            }
        }

    private:
        int _listenSocketFd;
        int _epollFd;
        uint16_t _port;

        struct epoll_event* _revs;
        int _revsNum;

        func_t _handlerRequest;
    };
}

#endif

epoll server test

When the epoll server calls the epoll_wait function, the value of timeout is set to -1, so if no client sends a connection request after running the server, the server will call the epoll_wait function and then block waiting

After using the telnet tool to connect to the epoll server, the epoll_wait function called by the epoll server will call accept to obtain the established connection after detecting that the read event of the listening socket is ready, and print out the IP and port number of the client. At this time, the client The sent data can also be successfully received by the epoll server and printed out

The epoll server is also a single-process, single-threaded server, but it can serve multiple clients

Use the ls /proc/PID/fd command to view the usage of the file descriptor of the current epoll server. File descriptors 0, 1, and 2 are opened by default, corresponding to standard input, standard output, and standard error respectively. File descriptor No. 3 corresponds to the listening socket, file descriptor No. 4 corresponds to the epoll handle, and No. 5 file descriptor corresponds to the epoll handle. and No. 6 file descriptors respectively correspond to the two clients accessing the server

When the server detects that the client exits, it will also close the corresponding connection. At this time, the No. 5 and No. 6 file descriptors corresponding to the epoll server are closed.

3.5 Advantages of epoll

  • The interface is easy to use: it is divided into three functions, which is more convenient and efficient to use, and will not be redundant
  • Lightweight data copy: call epoll_ctl to copy data from the user to the kernel only when adding monitoring events, while select and poll need to re-copy the events that need to be monitored from the user to the kernel every time. In addition, when calling epoll_wait to obtain ready events, only ready events will be copied, and unnecessary copy operations will not be performed
  • Event callback mechanism: avoid the active polling of the operating system to detect event readiness, but use the method of callback function to add the ready file descriptor structure to the ready queue. When calling epoll_wait, you can directly access the ready queue to know which file descriptors are ready. The time complexity of detecting whether a file descriptor is ready is O(1), because the essence only needs to judge whether the ready queue is empty.
  • No limit on number: There is no upper limit on the number of monitored file descriptors, as long as the memory allows, you can always add nodes to the red-black tree

Notice:

  • Some blogs say that epoll uses a memory mapping mechanism. The kernel can directly map the underlying ready queue to the user state through mmap. At this time, the user can directly read the data in the ready queue in the kernel, avoiding memory copying additional performance overhead
  • This statement is wrong, the actual operating system does not have any mapping mechanism, because the operating system does not trust anyone, the operating system will not let the user process directly access the data of the kernel, the user can only obtain it through the system call kernel data
  • Therefore, if the user wants to obtain the data in the kernel, it is bound to copy the data in the kernel to the user space

Differences from select and poll

  • When using select and poll, it is necessary to use a third-party array to maintain historical file descriptors and events that need to be monitored. The third-party array is maintained by the user, and the addition, deletion, and modification operations of the array need to be performed by the user.
  • When using epoll, the user does not need to maintain a third-party array. The red-black tree at the bottom of epoll acts as the function of this third-party array, and the addition, deletion, and modification operations of the red-black tree are all maintained by the kernel. Users only need to call epoll_ctl to let The kernel can perform corresponding operations on the red-black tree
  • When using a multiplex interface, the data flow has two directions, one is the user informs the kernel, and the other is the kernel informs the user. Select and poll hand over these two things to the same function to complete, and epoll separates these two things at the interface level, epoll completes the user to inform the kernel by calling epoll_ctl, and completes the kernel to inform the user by calling epoll_wait

3.6 How epoll works

Level trigger (LT, Level Triggered)

  • As long as the underlying event is ready, epoll will always notify the user
  • Similar to the high-level trigger in digital circuits, as long as it is always at high level, it will always trigger

epoll is the LT working mode by default

  • Since in the LT working mode, as long as the underlying data is ready, the user will be notified all the time, so when epoll detects that the underlying read event is ready, it may not process it immediately, or only process part of it, because as long as the underlying data is not processed, the next time epoll also notifies the user that events are ready.
  • Select and poll are actually working in LT mode
  • Support blocking read and write and non-blocking read and write

Edge Triggered (ET, Edge Triggered)

  • epoll will notify the user only when the number of underlying ready events changes from none to many or from many to many
  • Similar to the rising edge trigger in digital circuits, it will only trigger when the level changes from low to high

To change epoll to ET working mode, you need to set the EPOLLET option when adding an event

  • Since in the ET working mode, the user will be notified only when the underlying ready event has changed from none to many, so when epoll detects that the underlying read event is ready, it must be processed immediately, and all processing is completed, because It is possible that there will be no event ready at the bottom layer after that, then epoll will never notify the user to process the event, and the unprocessed data will be lost at this time
  • In ET working mode, epoll generally notifies users less times than LT, and processes all events in the buffer every time, thereby improving network throughput. Therefore, the performance of ET is generally higher than that of LT. Nginx adopts ET mode by default. use epoll
  • Only supports non-blocking read and write

How to read and write in ET working mode

Because in the ET working mode, the user will be notified only when the underlying ready event changes from none to many, which forces the user to read all the data at one time when the read event is ready. When the event is ready, the send buffer must be filled at one time, otherwise there may be no chance to read or write

Therefore, when reading data, the recv function must be called cyclically to read, and when writing data, the send function must be called cyclically to write

  • When the underlying read event is ready, call the recv function to read in a loop until the actual number of bytes read is less than the expected number of bytes when calling recv to read, indicating that the underlying data has been read this time finished
  • But it is possible that when calling recv to read for the last time, the number of bytes actually read is exactly equal to the number of bytes expected to be read, but at this time the underlying data has just been read. If the recv function is called again to read, Then recv will be blocked because there is no data at the bottom
  • The blocking here is very serious. For example, the servers written by the blog are all single-process servers. If the recv is blocked and the data is no longer ready after that, it is equivalent to the server hanging up. Therefore, in the ET working mode When calling the recv function for reading in the next cycle, the corresponding file descriptor must be set to a non-blocking state
  • The same is true when calling the send function to write data. It is necessary to call the send function cyclically to write data, and the corresponding file descriptor must be set to a non-blocking state

Emphasis: In ET working mode, the file descriptors of recv and send operations must be set to non-blocking state, which is necessary, not optional

Comparing LT and ET

  • In ET mode, after a file descriptor is ready, the user will not receive notifications repeatedly, which seems to be more efficient than LT, but if in LT mode, all ready file descriptors can be processed immediately every time, it is not necessary Let the operating system repeatedly notify the user, in fact, the performance of LT and ET is the same
  • ET is more difficult to program than LT

Guess you like

Origin blog.csdn.net/GG_Bruse/article/details/130857042