Why can single-threaded Redis support high concurrency?

Recently I was watching Unix network programming and studied the implementation of Redis. I feel that the source code of Redis is very suitable for reading and analysis. The implementation of the I/O multiplexing part is very clean and elegant. I want to carry out the content of this part here. Simple organization.


image.png

Several I/O models


Why use I/O multiplexing in Redis? First of all, Redis runs in a single thread, and all operations are executed linearly in sequence.


However, because read and write operations wait for user input or output are blocked, I/O operations often cannot return directly under normal circumstances.


This will cause the I/O of a certain file to be blocked and the entire process cannot provide services to other customers. I/O multiplexing is to solve this problem.


Blocking I/O


Let's first look at how the traditional blocking I/O model works: when using Read or Write to read and write a certain file descriptor (File Descriptor hereinafter referred to as FD).


If the current FD is unreadable or unwritable, the entire Redis service will not respond to other operations, causing the entire service to be unavailable.


This is the traditional blocking model we use most in programming:

image.png

Although the blocking model is very common and easy to understand in development, it will affect the services corresponding to other FDs, so the blocking model is often not used when multiple client tasks need to be processed.


I/O multiplexing


Although there are many other I/O models, they will not be introduced in detail here. The blocking I/O model does not meet the needs here. We need a more efficient I/O model to support multiple Redis clients (redis-cli).


Here is the I/O multiplexing model:

image.png

In the I/O multiplexing model, the most important function call is Select, which can simultaneously monitor the readability and write status of multiple file descriptors.


When some of the file descriptors are readable or writable, the Select method will return the number of readable and writable file descriptors.


Regarding the specific usage of Select, there is a lot of information on the Internet, so I won't introduce it here.


At the same time, there are other I/O multiplexing functions Epoll/Kqueue/Evport, which have better performance than Select and can support more services.


Reactor design pattern


The Redis service uses Reactor to implement file event handlers. (Each network connection actually corresponds to a file descriptor)

image.png

The file event handler uses the I/O multiplexing module to monitor multiple FDs at the same time. When the Accept, Read, Write and Close file events are generated, the file event handler will call back the event handler bound to the FD.


Although the entire file event processor runs on a single thread, the introduction of the I/O multiplexing module realizes the simultaneous monitoring of multiple FD readings and writing, which improves the performance of the network communication model. Ensure the simplicity of the entire Redis service implementation.


I/O multiplexing module


The I/O multiplexing module encapsulates the I/O multiplexing functions of Select, Epoll, Avport and Kqueue at the bottom layer, and provides the same interface for the upper layer.

image.png

Here we briefly introduce how Redis wraps Select and Epoll, and briefly understands the function of this module. The entire I/O multiplexing module smooths out the differences in I/O multiplexing functions on different platforms and provides The same interface:

static int  aeApiCreate(aeEventLoop *eventLoop)
static int  aeApiResize(aeEventLoop *eventLoop, int setsize)
static void aeApiFree(aeEventLoop *eventLoop)
static int  aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask)
static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int mask
static int  aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp)


At the same time, because the parameters required by each function are different, we store the required context information through an aeApiState inside each submodule:

// select
typedef struct aeApiState {
    fd_set rfds, wfds;
    fd_set _rfds, _wfds;
} aeApiState;

// epoll
typedef struct aeApiState {
    int epfd;
    struct epoll_event *events;
} aeApiState;


These context information will be stored in the void*state of eventLoop, will not be exposed to the upper layer, and will only be used in the current submodule.


Encapsulate the Select function


Select can monitor FD's readable, writable and error conditions. Before introducing how the I/O multiplexing module encapsulates the Select function, let's take a look at the general flow of the Select function:

  • Initialize a readable fd_set collection and save the FD that needs to be monitored for readability.

  • 使用 FD_SET 将 fd 加入 RFDS

  • 调用 Select 方法监控 RFDS 中的 FD 是否可读。

  • 当 Select 返回时,检查 FD 的状态并完成对应的操作。

int fd = /* file descriptor */

fd_set rfds;
FD_ZERO(&rfds);
FD_SET(fd, &rfds)

for ( ; ; ) {
    select(fd+1, &rfds, NULLNULLNULL);
    if (FD_ISSET(fd, &rfds)) {
        /* file descriptor `fd` becomes readable */
    }
}


而在 Redis 的 ae_select 文件中代码的组织顺序也是差不多的,首先在 aeApiCreate 函数中初始化 rfds 和 wfds:

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));
    if (!statereturn -1;
    FD_ZERO(&state->rfds);
    FD_ZERO(&state->wfds);
    eventLoop->apidata = state;
    return 0;
}


而 aeApiAddEvent 和 aeApiDelEvent 会通过 FD_SET 和 FD_CLR 修改 fd_set 中对应 FD 的标志位:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    if (mask & AE_READABLE) FD_SET(fd,&state->rfds);
    if (mask & AE_WRITABLE) FD_SET(fd,&state->wfds);
    return 0;
}


整个 ae_select 子模块中最重要的函数就是 aeApiPoll,它是实际调用 Select 函数的部分,其作用就是在 I/O 多路复用函数返回时,将对应的 FD 加入 aeEventLoop 的 Fired 数组中,并返回事件的个数:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, j, numevents = 0;

    memcpy(&state->_rfds,&state->rfds,sizeof(fd_set));
    memcpy(&state->_wfds,&state->wfds,sizeof(fd_set));

    retval = select(eventLoop->maxfd+1,
                &state->_rfds,&state->_wfds,NULL,tvp);
    if (retval > 0) {
        for (j = 0; j <= eventLoop->maxfd; j++) {
            int mask = 0;
            aeFileEvent *fe = &eventLoop->events[j];

            if (fe->mask == AE_NONE) continue;
            if (fe->mask & AE_READABLE && FD_ISSET(j,&state->_rfds))
                mask |= AE_READABLE;
            if (fe->mask & AE_WRITABLE && FD_ISSET(j,&state->_wfds))
                mask |= AE_WRITABLE;
            eventLoop->fired[numevents].fd = j;
            eventLoop->fired[numevents].mask = mask;
            numevents++;
        }
    }
    return numevents;
}


封装 Epoll 函数


Redis 对 Epoll 的封装其实也是类似的,使用 epoll_create 创建 Epoll 中使用的 epfd:

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));

    if (!state) return -1;
    state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);
    if (!state->events) {
        zfree(state);
        return -1;
    }
    state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
    if (state->epfd == -1) {
        zfree(state->events);
        zfree(state);
        return -1;
    }
    eventLoop->apidata = state;
    return 0;
}


在 aeApiAddEvent 中使用 epoll_ctl 向 epfd 中添加需要监控的 FD 以及监听的事件:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    struct epoll_event ee = {0}; /* avoid valgrind warning */
    /* If the fd was already monitored for some event, we need a MOD
     * operation. Otherwise we need an ADD operation. */

    int op = eventLoop->events[fd].mask == AE_NONE ?
            EPOLL_CTL_ADD : EPOLL_CTL_MOD;

    ee.events = 0;
    mask |= eventLoop->events[fd].mask; /* Merge old events */
    if (mask & AE_READABLE) ee.events |= EPOLLIN;
    if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
    ee.data.fd = fd;
    if (epoll_ctl(state->epfd,op,fd,&ee) == -1return -1;
    return 0;
}


由于 Epoll 相比 Select 机制略有不同,在 epoll_wait 函数返回时并不需要遍历所有的 FD 查看读写情况。


在 epoll_wait 函数返回时会提供一个 epoll_event 数组:

typedef union epoll_data {
    void    *ptr;
    int      fd; /* 文件描述符 */
    uint32_t u32;
    uint64_t u64;
epoll_data_t;

struct epoll_event {
    uint32_t     events; /* Epoll 事件 */
    epoll_data_t data;
};


其中保存了发生的 Epoll 事件( EPOLLIN、 EPOLLOUT、 EPOLLERR 和 EPOLLHUP)以及发生该事件的 FD。


aeApiPoll 函数只需要将 epoll_event 数组中存储的信息加入 eventLoop 的 Fired 数组中,将信息传递给上层模块:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, numevents = 0;

    retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    if (retval > 0) {
        int j;

        numevents = retval;
        for (j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events+j;

            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE;
            if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
            eventLoop->fired[j].fd = e->data.fd;
            eventLoop->fired[j].mask = mask;
        }
    }
    return numevents;
}


子模块的选择


Because Redis needs to run on multiple platforms, and in order to maximize execution efficiency and performance, different I/O multiplexing functions are selected as sub-modules according to different compilation platforms to provide a unified interface to the upper layer.


In Redis, we use macro definitions to reasonably select different sub-modules:

#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif


Because the Select function is used as a system call in the POSIX standard and will be implemented on different versions of the operating system, it is used as a guarantee scheme:

image.png

Redis will give priority to I/O multiplexing functions with O(1) time complexity as the underlying implementation, including Evport in Solaries 10, Epoll in Linux, and Kqueue in macOS/FreeBSD.


The above-mentioned functions all use the internal structure of the kernel and can serve hundreds of thousands of file descriptors.


However, if the current compilation environment does not have the above function, Select will be selected as an alternative. Since it will scan all monitored descriptors when it is used, its time complexity is poorer O(n).


And it can only serve 1024 file descriptors at the same time, so in general, Select is not used as the first solution.


to sum up


Redis has a very concise design for I/O multiplexing modules. It uses macros to ensure that I/O multiplexing modules have excellent performance on different platforms. Different I/O multiplexing functions are encapsulated into The same API is provided to the upper layer.


The entire module enables Redis to serve thousands of file descriptors while running in a single process, avoiding the increase in code implementation complexity due to the introduction of multi-process applications and reducing the possibility of errors.



Guess you like

Origin blog.51cto.com/14410880/2550953