Multithreading is officially supported! Redis 6.0 and the old version performance comparison evaluation

Multithreading is officially supported! Redis 6.0 and the old version performance comparison evaluation

Introduction: Redis 6.0 will be released at the end of this year. The most significant change introduced is multi-threaded IO. The author of this article has read and analyzed the key code in depth, and performed benchmark tests to reveal the performance improvement of Redis by the multi-threaded IO feature.
Multithreading is officially supported!  Redis 6.0 and the old version performance comparison evaluation
Lin Tianyi, Technical Manager of Meitu, is mainly responsible for the research and development of basic services such as NoSQL/message queue/middleware. Before joining Meitu, he worked for the Sina Weibo architecture platform in the research and development of basic services.

The night before yesterday I inadvertently saw Redis author Salvatore sharing it at RedisConf 2019. One of the sections showed that the multi-threaded IO feature introduced by Redis 6 has at least doubled the performance improvement. I am very excited. I can’t wait to see the relevant code implementation .

At present, for single-threaded Redis, the performance bottleneck is mainly the IO consumption of the network. There are two main optimization directions:
1. Improve network IO performance. A typical implementation is like using DPDK to replace the kernel network stack.
2. Use multithreading fully Using multiple cores, typical implementations are like Memcached

This method of protocol stack optimization has little to do with Redis. The multi-threading feature has been repeatedly mentioned in the community for a long time and finally added multi-threading in Redis 6. Salvatore also has a simple explanation in his blog An update about Redis developments in 2019. But it is somewhat different from Memcached's implementation mode from IO processing to data access multithreading. The multi-threaded part of Redis is only used to process network data reading and writing and protocol analysis, and the execution of commands is still single-threaded. The reason for this design is that you don't want to be complicated by multithreading, you need to control key, lua, transaction, LPUSH/LPOP and other concurrency issues. The overall design is roughly as follows:
Multithreading is officially supported!  Redis 6.0 and the old version performance comparison evaluation

Code


The read (request) and write (response) of multi-threaded IO are the same in the implementation process, but the difference between read and write operations. At the same time, these IO threads are all reading or writing at the same time, and will not be partially read or partially written, so the reading process is taken as an example below. The code in the analysis process is only to aid understanding, so it will only cover the core logic and not all the details. If you want to fully understand the details, it is recommended to look at the source code implementation again after reading it.

After adding multi-threaded IO, the overall reading process is as follows:

  1. The main thread is responsible for receiving the connection request, and the read event (receiving the request) is placed in a global waiting read processing queue
  2. After the main thread processes the read event, these connections are allocated to these IO threads through RR (Round Robin), and then the main thread is busy waiting (the effect of spinlock)
  3. The IO thread will request the data to be read and parsed (here only the data is read and the analysis is not executed)
  4. The main thread executes all commands and empties the entire request waiting read processing queue (executes part of the serial)

The above process is completely lock-free, because the main thread will wait for all IO threads to complete when the IO thread is processed, so the data race scenario will not appear.

Note: If you are not interested in the code implementation, you can skip the following content directly, it will not hurt to understand the performance improvement of Redis.

The following code analysis corresponds to the above process. When the main thread receives a request, it will call back the readQueryFromClient function in network.c:

  1. void readQueryFromClient(aeEventLoop el, int fd, void privdata, int mask) {
  2. /* Check if we want to read from the client later when exiting from
    • the event loop. This is the case if threaded I/O is enabled. */
  3. if (postponeClientRead(c)) return;
  4. ...
  5. }

The previous implementation of readQueryFromClient was responsible for reading and parsing requests and executing commands. After adding multi-threaded IO, the above line of code was added. The postponeClientRead implementation is as follows:

int postponeClientRead(client *c) {
    if (io_threads_active &&   // 多线程 IO 是否在开启状态,在待处理请求较少时会停止 IO 
    多线程
        server.io_threads_do_reads && // 读是否开启多线程 IO
        !(c->flags & (CLIENT_MASTER|CLIENT_SLAVE|CLIENT_PENDING_READ)))  // 主从库复制请求不使用多线程 IO
    {
        // 连接标识为 CLIENT_PENDING_READ 来控制不会反复被加队列,
        // 这个标识作用在后面会再次提到
        c->flags |= CLIENT_PENDING_READ;
        // 连接加入到等待读处理队列
        listAddNodeHead(server.clients_pending_read,c);
        return 1;
    } else {
        return 0;
    }
}

postponeClientRead judges that if multi-threaded IO is turned on and it is not a master-slave replication connection, it will be placed in the queue and return 1. The readQueryFromClient function will directly return without command parsing and execution. Then the main thread, after processing the read event (note that the read event is not reading data), allocates these connections to these IO threads through RR:

int handleClientsWithPendingReadsUsingThreads(void) {
  ...
    // 将等待处理队列的连接按照 RR 的方式分配给多个 IO 线程
    listRewind(server.clients_pending_read,&li);
    int item_id = 0;
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        int target_id = item_id % server.io_threads_num;
        listAddNodeTail(io_threads_list[target_id],c);
        item_id++;
    }
    ...

    // 一直忙等待直到所有的连接请求都被 IO 线程处理完
    while(1) {
        unsigned long pending = 0;
        for (int j = 0; j < server.io_threads_num; j++)
            pending += io_threads_pending[j];
        if (pending == 0) break;
    }

The io_threads_list in the code is used to store the connections that each IO thread needs to handle, and then the main thread assigns these connections to these IO threads through RR and enters the busy waiting state (equivalent to the main thread blocking). The IO processing thread entry is the IOThreadMain function:

void *IOThreadMain(void *myid) {
  while(1) {
        // 遍历线程 id 获取线程对应的待处理连接列表
        listRewind(io_threads_list[id],&li);
        while((ln = listNext(&li))) {
            client *c = listNodeValue(ln);
            // 通过 io_threads_op 控制线程要处理的是读还是写请求
            if (io_threads_op == IO_THREADS_OP_WRITE) {
                writeToClient(c->fd,c,0);
            } else if (io_threads_op == IO_THREADS_OP_READ) {
                readQueryFromClient(NULL,c->fd,c,0);
            } else {
                serverPanic("io_threads_op value is unknown");
            }
        }
        listEmpty(io_threads_list[id]);
        io_threads_pending[id] = 0;
  }
}

IO thread processing controls whether the current IO thread should process read or write events according to the global io_threads_op state. This is also the case that all IO threads mentioned above can only perform read or write at the same time. In addition, careful students may notice that the processing thread calls the readQueryFromClient function, and the connection is added to the queue by this callback function. Isn't it an endless loop? The answer to this is in the postponeClientRead function. Connections that have been added to the waiting queue will be marked with CLIENT_PENDING_READ. The postponeClientRead function will not add the connection to the queue again, so readQueryFromClient will continue to execute the read and parse requests. The readQueryFromClient function reads the request data and calls the processInputBuffer function to parse the command. The processInputBuffer will determine whether the current connection comes from the IO thread. If so, it will only parse the command without executing the command, and the code will not be posted.

If you look at the implementation of IOThreadMain, you will find that these io threads do not have any sleep mechanism. In the idle state, the CPU of each thread will run to 100%, but simple sleep will lead to delayed read and write processing and worse performance. Redis's current solution is to close these IO threads when waiting to process fewer connections. Why not use condition variables to control it? I didn't want to understand, I can ask questions in the community later.

Performance comparison


Pressure test configuration:

Redis Server: Alibaba Cloud Ubuntu 18.04, 8 CPU 2.5 GHZ, 8G memory, host model ecs.ic5.2xlarge

Redis Benchmark Client: 阿里云 Ubuntu 18.04,8 2.5 GHZ CPU, 8G 内存,主机型号 ecs.ic5.2xlarge

The multi-threaded IO version has just been merged into the unstable branch for a while, so the unstable branch can only be used to test multi-threaded IO. The single-threaded version is Redis 5.0.5. The multi-threaded IO version needs to add the following configuration:

io-threads 4 # 开启 4 个 IO 线程
io-threads-do-reads yes # 请求解析也是用 IO 线程

Pressure test command:

redis-benchmark -h 192.168.0.49 -a foobared -t set,get -n 1000000 -r 100000000 --threads 4 -d ${datasize} -c 256

Multithreading is officially supported!  Redis 6.0 and the old version performance comparison evaluation
Multithreading is officially supported!  Redis 6.0 and the old version performance comparison evaluation
From the above, we can see that the performance of GET/SET command in 4-thread IO is almost doubled compared to single thread. In addition, these data are just to simply verify whether multi-threaded IO really brings performance optimization, and does not perform stress testing for rigorous delay control and different concurrency scenarios. The data is only for verification and reference and cannot be used as an online indicator. It is only the performance of the current unstble branch. It does not rule out that the performance of the subsequent official version will be better.

Note: Redis Benchmark is single-threaded except for the unstable branch. For the multi-threaded IO version, the performance of the pressure test package will become a bottleneck. You must compile the redis-benchmark of the unstable branch to perform the pressure test and configure --threads to enable multiple Thread pressure test. In addition, don't panic if you find that the compilation fails, this is because Redis uses the Atomic_ feature, which is only supported by newer versions of the compilation tools, such as GCC 5.0 and above.

to sum up


Redis 6.0 is expected to be released at the end of 2019, and there will be great improvements in performance, protocol, and access control. Salvatore has devoted himself to optimizing Redis and cluster functions this year, which is especially worth looking forward to. In addition, at the end of this year, the community will also release the first version of redis cluster proxy to solve the problem of multi-language SDK compatibility. It is expected that the cluster will be more widely used in China after it has the proxy function.

Reference reading:

  • comprehensive! An article to understand the common means of microservices high availability
  • Problems of a 100-person R&D team: R&D management, performance appraisal, organizational culture and OKR
  • Perhaps the most concise version, an article to get started with Go language
  • 5G Innovative Application Practice-Empowering the Engine of the Internet of Everything
  • Application of KISS Principle in Order Shipment Model
  • Know the past, present and future of the service that has been read

Technical originality and architecture practice articles are welcome to submit via the "Contact Us" menu of the official account. Please indicate that it is from the high-availability framework "ArchNotes" WeChat official account and include the following QR code.

Highly available architecture

Changing the way the internet is built

Multithreading is officially supported!  Redis 6.0 and the old version performance comparison evaluation

Guess you like

Origin blog.51cto.com/14977574/2546530