[Linux] --- Detailed explanation of select and epoll

The difference between select and epoll (interview test)

  • First of all, select is supported by posix, and epoll is a linux-specific system call. Therefore, the portability of epoll is not as good as select, but considering that epoll and select are generally used as servers, and most of the servers are Linux. So the impact of this portability should not be great.

  • Secondly, the file descriptors that select can listen to are limited, the maximum value is 1024, and the file descriptors that epoll can listen to are the maximum file descriptors that the system limits to the entire process.

  • Next, let's talk about the performance comparison between epoll and select. In general, epoll should perform better. Otherwise, Linux will not specifically implement the epoll function. So why is epoll more efficient than select? There are many reasons. First, epoll inserts it into a ready queue every time there is a ready event, so that the returned result of epoll_wait stores only the events that are already ready, and select returns all the events that are monitored. If the event is ready, the application needs to detect it. If there are many events that have been monitored but not ready, the impact on performance will be greater. The second point is that each time you call select to obtain a ready event, you must repeatedly pass the event that needs to be monitored to the operating system kernel, and epoll handles the monitoring file descriptor separately from the call to obtain the ready event, so that the call to the ready event epoll_wait does not need to retransmit the list of events that need to be monitored. This repeated transmission of events that need to be monitored is also one of the reasons for the low performance. In addition, epoll's implementation uses mmap calls to make kernel space and user space share memory, thereby avoiding the overhead caused by excessive kernel and user space switching.

  • Then epoll provides two working modes, one is the horizontal trigger mode, this mode is the same as the trigger mode of select, that is, as long as there is data in the buffer of the file descriptor, the user is always notified that this descriptor is available. Read, this mode supports both block and noblock descriptors, and the difficulty of programming is relatively small; while another mode that is more efficient and only provided by epoll is the edge trigger mode, which only supports nonblock file descriptors. The application will be notified when a new listening event occurs in the file descriptor (for example, a new data packet arrives). When no new listening time occurs, even if the buffer has data (that is, the last time it has not been read, or (Even without reading), epoll will not continue to notify the application. Using this mode generally requires the application to read the data until it receives the file descriptor read-ready notification, until it receives the EWOULDBLOCK / EAGAIN error. After reading the content in the buffer, otherwise it may cause death, especially when a listen_fd needs to listen to the connection Designate, if multiple connections at the same time to reach, if only each accept a call, it will lead to more connections remain in the kernel buffer, it is the way to deal with while clinging to accept loop until it appears EAGAIN. Although this mode is error-prone, its performance is more efficient than the previous mode, because it only needs to monitor whether there is an event, and directly add the descriptor to the ready queue when it occurs.

select

1. What is select

The system provides a select function to implement a multiplexed input / output model. The
select system call is used to allow our program to monitor the status changes of multiple file descriptors; the
program will stop at select and wait until the monitored file description One or more characters have changed state.

1.select function prototype

The function prototype of select is as follows: #include <sys/select.h>

int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);

2. Parameter explanation

  • fd_set: There is an array in the descriptor set. This structure is used to add a descriptor to the array and add the descriptor to the set. In fact, it is the bit position 1 corresponding to the number of the descriptor; and this bitmap How many descriptors can be added depends on a macro: _FD_SETSIZE = 1024, so the maximum number of descriptors that can be monitored by the select model is limited;
  • The parameter nfds is the largest file descriptor value to be monitored + 1;
  • rdset, wrset, exset respectively correspond to the set of readable file descriptors to be detected, the set of writable file descriptors and the set of abnormal file descriptors
  • The parameter timeout is the structure timeval, used to set the wait time of select ()

3. Parameter timeout value

  • NULL: It means that select () has no timeout, and select will be blocked until an event occurs on a file descriptor;
  • 0: Only detect the status of the descriptor set, and then return immediately, without waiting for the occurrence of external events.
  • Specific time value: If no event occurs within the specified time period, select will return after timeout

4. Return value

  • (> 0) indicates how many descriptors in the current collection are ready
  • (== 0) Indicates that the wait timed out (during the blocking period, there was no ready descriptor)
  • (<0) indicates that the monitoring error (select this monitoring error)

5. Monitoring principle

  • 1. The user defines each descriptor set he cares about, and adds the descriptor to the corresponding set.

  • 2. Call the select interface, pass in the collection, copy the data in the collection to the kernel for monitoring,

    • Monitoring principle: continuous polling and traversal in the kernel to determine which descriptor is ready`

    • Readable: the data size in the read buffer is greater than the low water mark (usually 1 byte)

    • Write-ready: the remaining space in the write buffer is larger than the low water mark (usually 1 byte)

    If there is a descriptor in any of the current collections, the select call returns after traversing the collection. Before the call returns, all the descriptors in the collection are removed from the collection.
    The set returned by select is a ready descriptor set.

  • 3. Although the user cannot obtain the ready descriptor immediately after the select call returns, he can judge whether the descriptor is the ready descriptor by judging which descriptor is currently in the set, and then perform the corresponding operation.

  • 4. Because the set is selected and modified before the ready return, only the ready descriptor is retained, so it needs to be added to the descriptor set every time before re-monitoring.

Second, select ready conditions

1. Ready to read

  • In the socket kernel, the number of bytes in the receive buffer is greater than or equal to the low water mark SO_RCVLOWAT. At this time, the file descriptor can be read without blocking,
    and the return value is greater than 0;
  • In socket TCP communication, the peer closes the connection, at this time reading the socket, it returns 0;
  • There is a new connection request on the monitored socket;
  • There are unhandled errors on the socket;

2. Write ready

  • In the socket kernel, the number of available bytes in the send buffer (the size of the free position of the send buffer) is greater than or equal to the low water mark SO_SNDLOWAT,
    which can be written without blocking and the return value is greater than 0;
  • The write operation of the socket is closed (close or shutdown). Writing to a socket whose write operation is closed will trigger the SIGPIPE signal;
  • After the socket uses a non-blocking connect to succeed or fail;
  • Unread error on socket

Three, select features

  1. The number of file descriptors that can be monitored depends on the value
    of sizeof (fd_set). Sizeof (fd_set) = 512 on my server, each bit represents a file descriptor, then the maximum file descriptor supported on my server is 512 * 8 = 4096.
  2. While adding fd to the select monitoring set, we must use another data structure array to store the fd placed in the select monitoring set.
    • ①It is used to select FD_ISSET after the return of select and array as the source data and fd_set.
    • ② After the select returns, the previously added fd will be emptied but no events have occurred. Then each time you start the selection, you must get fd from the array one by one (FD_ZERO first), scan the array and obtain the maximum fd maxfd The first parameter used for select.

Four, the advantages and disadvantages of select

1. Disadvantages

  • 1. The maximum number of descriptors you can monitor for select depends on __FD_SETSIZE which is equal to 1024 by default
  • 2. Select needs to copy the collection data from the user mode to the kernel mode for monitoring every time
  • 3. select uses polling traversal in the kernel to monitor, and the monitoring performance decreases as the descriptors increase
  • 4. Select will not directly return the ready descriptor to the user, but a returned ordered set. It requires the user to traverse the judgment to find the ready descriptor. The efficiency is low and the code complexity is increased.
  • 5. The select will modify the collection every time the call returns, so every time you need to add a descriptor to the collection before monitoring

2. Advantages

  • 1. Follow the posix standard and have good cross-platform portability

  • 2. The monitoring time can be fine to microseconds

Five, choose to use examples

Use select to implement the dictionary server
tcp_select_server.hpp

#pragma once
#include <vector>
#include <unordered_map>
#include <functional>
#include <sys/select.h>
#include "tcp_socket.hpp"
// 必要的调试函数
inline void PrintFdSet(fd_set* fds, int max_fd) {
printf("select fds: ");
for (int i = 0; i < max_fd + 1; ++i) {
if (!FD_ISSET(i, fds)) {
continue;
}
printf("%d ", i);
}
printf("\n");
}
typedef std::function<void (const std::string& req, std::string* resp)> Handler;
// 把 Select 封装成一个类. 这个类虽然保存很多 TcpSocket 对象指针, 但是不管理内存
class Selector {
public:
Selector() {
// [注意!] 初始化千万别忘了!!
max_fd_ = 0;
FD_ZERO(&read_fds_);
}
bool Add(const TcpSocket& sock) {
int fd = sock.GetFd();
printf("[Selector::Add] %d\n", fd);
if (fd_map_.find(fd) != fd_map_.end()) {
printf("Add failed! fd has in Selector!\n");
return false;
}
fd_map_[fd] = sock;
FD_SET(fd, &read_fds_);
if (fd > max_fd_) {
max_fd_ = fd;
}
return true;
}
bool Del(const TcpSocket& sock) {
int fd = sock.GetFd();
printf("[Selector::Del] %d\n", fd);
if (fd_map_.find(fd) == fd_map_.end()) {
printf("Del failed! fd has not in Selector!\n");
return false;
}
fd_map_.erase(fd);
FD_CLR(fd, &read_fds_);
// 重新找到最大的文件描述符, 从右往左找比较快
for (int i = max_fd_; i >= 0; --i) {
if (!FD_ISSET(i, &read_fds_)) {
continue;
}
max_fd_ = i;
break;
}
return true;
}
// 返回读就绪的文件描述符集
bool Wait(std::vector<TcpSocket>* output) {
output->clear();
// [注意] 此处必须要创建一个临时变量, 否则原来的结果会被覆盖掉
fd_set tmp = read_fds_;
// DEBUG
PrintFdSet(&tmp, max_fd_);
int nfds = select(max_fd_ + 1, &tmp, NULL, NULL, NULL);
if (nfds < 0) {
perror("select");
return false;
}
// [注意!] 此处的循环条件必须是 i < max_fd_ + 1
for (int i = 0; i < max_fd_ + 1; ++i) {
if (!FD_ISSET(i, &tmp)) {
continue;
}
output->push_back(fd_map_[i]);
}
return true;
}
private:
fd_set read_fds_;
int max_fd_;
// 文件描述符和 socket 对象的映射关系
std::unordered_map<int, TcpSocket> fd_map_;
};
class TcpSelectServer {
public:
TcpSelectServer(const std::string& ip, uint16_t port) : ip_(ip), port_(port) {
}
bool Start(Handler handler) const {
// 1. 创建 socket
TcpSocket listen_sock;
bool ret = listen_sock.Socket();
if (!ret) {
return false;
}
// 2. 绑定端口号
ret = listen_sock.Bind(ip_, port_);
if (!ret) {
return false;
}
// 3. 进行监听
ret = listen_sock.Listen(5);
if (!ret) {
return false;
}
// 4. 创建 Selector 对象
Selector selector;
selector.Add(listen_sock);
// 5. 进入事件循环
for (;;) {
std::vector<TcpSocket> output;
bool ret = selector.Wait(&output);
if (!ret) {
continue;
}
// 6. 根据就绪的文件描述符的差别, 决定后续的处理逻辑
for (size_t i = 0; i < output.size(); ++i) {
if (output[i].GetFd() == listen_sock.GetFd()) {
// 如果就绪的文件描述符是 listen_sock, 就执行 accept, 并加入到 select 中
TcpSocket new_sock;
listen_sock.Accept(&new_sock, NULL, NULL);
selector.Add(new_sock);
} else {
// 如果就绪的文件描述符是 new_sock, 就进行一次请求的处理
std::string req, resp;
bool ret = output[i].Recv(&req);
if (!ret) {
selector.Del(output[i]);
// [注意!] 需要关闭 socket
output[i].Close();
continue;
}
// 调用业务函数计算响应
handler(req, &resp);
// 将结果写回到客户端
output[i].Send(resp);
}
} // end for
} // end for (;;)
return true;
}
private:
std::string ip_;
uint16_t port_;
};


The code of dict_server.cc is the same as before, except that the server object inside is changed to the TcpSelectServer class.
The client is exactly the same as the previous client, no separate development is required

poll

int poll(struct poofd*fds,nfds_t nfds,int timeout)

struct pollfd
{
int fd; // File descriptor
short events // Events that are actually ready for the current descriptor
short revents // Events that are actually ready for the current descriptor
}

Poll monitors descriptors, organizes an event structure for the descriptors of most concern, and fills in information: events filled in events, events that the user cares about. After monitoring, if the descriptor is ready for the entire event, the event is placed in revents Record

  • 1. The user defines a struct pollfd time structure array, and fills in the relevant information of the descriptor of interest

  • 2. Will call the poll interface, copy the data to the kernel for monitoring,

    • Monitoring principle: In the kernel, each event is polled and traversed to monitor. When a descriptor is ready, the ready event information is recorded in the corresponding members of the corresponding structure array node, and the call returns.
  • 3. The user traverses the array, judges whether the description is ready by judging each node, and then directly operates the fd in the node

epoll

epoll is to handle a lot of handles. Two improved polls

1. epoll_create

int epoll_create(int size);
Create an epoll handle
  • Since linux2.6.8, the size parameter is ignored.
  • After use, you must call close () to close

二、epoll_wait

int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

1. Collect the events that have been sent out of the events monitored by epoll

  1. The parameter events is an array of allocated epoll_event structures.
  2. epoll will assign events to the events array
    (events cannot be a null pointer, the kernel is only responsible for copying data into this events array, it will not help us allocate memory in user mode)
  3. maxevents tells the kernel how big the events are, and the value of this maxevents cannot be greater than the size when creating epoll_create ().
  4. The parameter timeout is the timeout time (milliseconds, 0 will return immediately, -1 is permanently blocked).
  5. If the function call succeeds, the number of file descriptors prepared on the corresponding I / O is returned. For example, a return of 0 indicates that it has timed out, and a return of less than 0 indicates that the function failed.

2. The advantages of epoll (corresponding to the disadvantages of select)

  1. The interface is easy to use: Although it is split into three functions, it is more convenient and efficient to use. It does not need to set the file descriptor of interest every cycle, and it also separates the input and output parameters.
  2. Light data copy: only call EPOLL_CTL_ADD when appropriate to copy the file descriptor structure to the kernel,
    this operation is not frequent (and select / poll is to be copied every cycle)
  3. Event callback mechanism: Avoid using traversal, but use the callback function to add the ready file descriptor structure to the ready queue. Epoll_wait
    returns directly to the ready queue to know which file descriptors are ready. This operation time complexity is O ( 1). Even if there are many file descriptors, the efficiency will not be affected.
  4. No number limit: No limit on the number of file descriptors

Three, epoll use scenarios

  • The high performance of epoll, there are certain specific scenes. If the scene selection is not suitable, the performance of epoll may be counterproductive.
  • For multiple connections, and only a part of the connections are active, it is more suitable to use epoll.

For example, a typical server that needs to handle tens of thousands of clients, such as the entrance server of various Internet apps, such a server is very suitable for epoll. If it is only inside the system, there is only a few connections between the server and the server In this case, it is not appropriate to use epoll. The specific IO model should be decided according to the needs and scene characteristics.

Published 63 original articles · won 322 · views 50,000+

Guess you like

Origin blog.csdn.net/L19002S/article/details/105426779