Redis source code analysis-eventloop (the core of redis scheduling)

I believe you have read a sentence in many articles about Redis: Redis is a single process, so it does not need to consider the trouble caused by so many processes. This is actually not entirely correct. If you have read the Redis source code, you must know that Redis has a main process and multiple child processes at the same time. But its child process is usually used to handle a temporary task (such as the RDB persistence process, AOF rewrite, full synchronization between active and standby, etc., are relatively time-consuming tasks), once the task is processed It was destroyed.

Today, the issue we want to discuss is that in this main process, its scheduling core-eventloop.

Those who have written C language know that the entry point of any C language program is the main function. After the main function is executed, the program will exit.

For some service-type programs, it needs to stay for a long time and continuously provide service responses to users, such as Redis-server, and some embedded device software such as routers, firewalls, base stations, etc. Once this type of program is started, it cannot be exited immediately.

So, how can the main function not exit? The simplest is to write an infinite loop in it, don't you need to exit?

int main(void)
{
    // something
 
    // server loop
    while(true)
    {
    // wait something
    // do something
    // wait next
    }
    exit();
}

This method is sure to ensure that the program does not exit, but it will have a greater impact on performance, and the so-called "busy wait" (an endless loop will always occupy the CPU).

To solve this problem, we must first analyze what "inputs" the service needs to respond to?

There are two types of "input" that I understand:

1. The internal generation of the system is mainly triggered by a timer; for example, the Redis key supports timeout, so the system will definitely check which keys have timed out in a certain period. This is an input event triggered by the timer.

2. Generated outside the system. This is mainly some IO events. For example, if a user enters a command in redis-cli, it will eventually be transmitted to the server through the socket. For the server, this is a socket IO event. For another example, the persistence process of rdb is the read and write IO events of disk files. In Linux, these IO events can eventually be converted into read and write events to an FD (file descriptor).

In Redis's main function, these two types of "inputs" are mainly processed in a loop. We call this big loop eventLoop.

Let's take a look at how eventLoop handles it?

In the main function of redis-server, aeMain will be called at the end. This is our main loop and must be placed at the end of main. The code will always execute in this loop until it reaches this point, unless it exits.

Inside aeMain is a while loop. Its execution condition eventLoop->stop is false under normal circumstances, so it will keep looping.

 

It calls aeProcessEvents, and this function is the core loop processing entry. Below we focus on analyzing this function.

int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
    int processed = 0, numevents;
 
    /* Nothing to do? return ASAP */
    if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;
 
    /* Note that we want call select() even if there are no
     * file events to process as long as we want to process time
     * events, in order to sleep until the next time event is ready
     * to fire. */
	
    // ae的事件有两种类型:fd的IO事件、定时器事件,如果都不存在,则不需要执行下面代码。
    if (eventLoop->maxfd != -1 ||
        ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {
        int j;
        aeTimeEvent *shortest = NULL;
        struct timeval tv, *tvp;
 
        // 下面这一大段代码巴拉巴拉,其实都是在干一件事情,就是找到最近一个超时的定时器事件。
        // 并且获取距离当前的这个时间间隔值。拿来做什么用呢?
        // 为了避免“忙等待”,我们在检查FD的IO读写状态时(select或者epoll),都会采用阻塞的方式,如果没有可读可写的FD,就一直阻塞着等待。但是,我还有定时器事件要处理啊,如果一直没有IO事件,那我定时器事件不是一直没法处理么?
        // 所以,我们会给select或者epoll传入一个阻塞的超时时间,超过这个时间,都给我返回。
        // 下面获取的这个值,就是用于设置阻塞超时时间的。
        // 这样做,既可以避免非阻塞式的忙等待,又可以保证定时器事件能够按时得到处理。
        // 其实这种处理方式非常普遍,以C为开发语言的很多服务型软件都是这样玩的。
 
        if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))
            // 找到最近超时的定时器事件
            shortest = aeSearchNearestTimer(eventLoop); 
        if (shortest) { 
            long now_sec, now_ms;
 
            aeGetTime(&now_sec, &now_ms);
            tvp = &tv;
 
            /* How many milliseconds we need to wait for the next
             * time event to fire? */
            long long ms =
                (shortest->when_sec - now_sec)*1000 +
                shortest->when_ms - now_ms;
			// 计算超时时间
            if (ms > 0) {
                tvp->tv_sec = ms/1000;
                tvp->tv_usec = (ms % 1000)*1000;
            } else {
                // 已经有定时器超时了。采用非阻塞(tvp设置为0),立即返回。
                tvp->tv_sec = 0;
                tvp->tv_usec = 0;
            }
        } else {
            /* If we have to check for events but need to return
             * ASAP because of AE_DONT_WAIT we need to set the timeout
             * to zero */
            if (flags & AE_DONT_WAIT) {
                tv.tv_sec = tv.tv_usec = 0;
                tvp = &tv;
            } else {
                /* Otherwise we can block */
                // 没有定时器事件? 那就一直阻塞吧。
                tvp = NULL; /* wait forever */
            }
        }
        
        // AE_DONT_WAIT表示强制不允许阻塞。这在TLS的场景中有用。
        if (eventLoop->flags & AE_DONT_WAIT) {
            tv.tv_sec = tv.tv_usec = 0;
            tvp = &tv;
        }
 
		// select之前提供一个回调
        if (eventLoop->beforesleep != NULL && flags & AE_CALL_BEFORE_SLEEP)
            eventLoop->beforesleep(eventLoop);
 
        /* Call the multiplexing API, will return only on timeout or when
         * some event fires. */
        // 这里面就是调select或者epoll。这就是网上老说的IO多路复用,很多把这个点作为redis高性能的一个重要原因来提。但是,IO多路复用不是很普遍吗? 现在还有读socket不是这样多路复用的吗?
        numevents = aeApiPoll(eventLoop, tvp);
 
        /* After sleep callback. */
		// select之后提供一个回调
        if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)
            eventLoop->aftersleep(eventLoop);
 
		// 回调各个event的处理函数
        for (j = 0; j < numevents; j++) {
            aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
            int mask = eventLoop->fired[j].mask;
            int fd = eventLoop->fired[j].fd;
            int fired = 0; /* Number of events fired for current fd. */
 
            /* Normally we execute the readable event first, and the writable
             * event laster. This is useful as sometimes we may be able
             * to serve the reply of a query immediately after processing the
             * query.
             *
             * However if AE_BARRIER is set in the mask, our application is
             * asking us to do the reverse: never fire the writable event
             * after the readable. In such a case, we invert the calls.
             * This is useful when, for instance, we want to do things
             * in the beforeSleep() hook, like fsynching a file to disk,
             * before replying to a client. */
            int invert = fe->mask & AE_BARRIER;
 
            /* Note the "fe->mask & mask & ..." code: maybe an already
             * processed event removed an element that fired and we still
             * didn't processed, so we check if the event is still valid.
             *
             * Fire the readable event if the call sequence is not
             * inverted. */
            // 可读事件的回调处理
            if (!invert && fe->mask & mask & AE_READABLE) {
                fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                fired++;
                fe = &eventLoop->events[fd]; /* Refresh in case of resize. */
            }
            
            // 可写事件的回调处理
            /* Fire the writable event. */
            if (fe->mask & mask & AE_WRITABLE) {
                if (!fired || fe->wfileProc != fe->rfileProc) {
                    fe->wfileProc(eventLoop,fd,fe->clientData,mask);
                    fired++;
                }
            }
 
            /* If we have to invert the call, fire the readable event now
             * after the writable one. */
            if (invert) {
                fe = &eventLoop->events[fd]; /* Refresh in case of resize. */
                if ((fe->mask & mask & AE_READABLE) &&
                    (!fired || fe->wfileProc != fe->rfileProc))
                {
                    fe->rfileProc(eventLoop,fd,fe->clientData,mask);
                    fired++;
                }
            }
 
            processed++;
        }
    }
    /* Check time events */
	// 处理定时器事件
    if (flags & AE_TIME_EVENTS)
        processed += processTimeEvents(eventLoop);
 
    return processed; /* return the number of processed file/time events */
}

Please pay attention to the comments I added in it.

Its entire scheduling logic is actually very simple. Eventloop events are divided into FD read and write events and timer events, let's look at a simple FD read and write event registration example. For example, take the readable event of socket as an example, its interface is:

This is the interface of the conn module. We know that the sockets of redis are encapsulated by conn. This function will call aeCreateFileEvent to register the AE_READABLE event of fd and register a callback function ae_handler for it.

The fd in the same system is allocated by the kernel and global, so fd is unique. For query efficiency, redis uses the array events[] to store these events. This array is allocated in server_init. This is space for time.

 

Let's take a look at how the timer is handled?

At first I thought that redis would register a lot of timers, but I finally found out that there is only one timer in its true sense, which is serverCron. In other words, when server_init is called, aeCreateTimeEvent is called to register a global big timer serverCron.

All other timing tasks are driven by this serverCron. Therefore, we will see that redis has a concept of clock speed (server.hz), which actually means how many times the serverCron will be called in 1 second. The default server.hz ​​= 10, that is, it is called 10 times per second, that is, it is called once in 100ms.

Since almost all tasks that need to be processed regularly in redis are driven in serverCron, in theory, you can improve performance by adjusting this frequency.

Let's look at an example and see how it drives other timing tasks. For example, redis will print the status of the client every 5s, which is actually a small timer for 5s.

(_ms_ <= 1000/server.hz) , this is very simple, if your timeout period is less than the period of the main frequency, then every call must be timed out.

!(server.cronloops%((_ms_)/(1000/server.hz))):

server.cronloops is a cumulative count, and serverCron will increase by 1 every time it is called.

The result of ((_ms_)/(1000/server.hz)) indicates how many serverCron calls need to be called to satisfy one of my timeout periods.

Take the remainder of the two, as long as the remainder is 0, then it must be a timeout period.

 

Therefore, all timing tasks of redis are driven based on serverCron, while eventloop directly drives only serverCron.

 

This is the implementation logic of eventloop, the scheduling core of the Redis main process. Of course, there are some details, you can see the code digest for yourself.

 

 

Guess you like

Origin blog.csdn.net/Crystalqy/article/details/110685293