Pipeline: Can be used on nodes to solve Redis performance problems

foreword

Let's first look at the interaction model between Redis client and server

It can be concluded that:

1. Redis is a synchronous request service based on a Request and a Response

2. The client sends the data packet to the server, and then the server sends the response data back to the client, which takes a certain amount of time. This time is called round trip time RTT (Round Trip Time).

When a client needs to perform many requests in a row, it is easy to see that round-trip time affects system performance

For example: if the round-trip time RTT is 250ms, even if the Redis server can handle 1000 requests per second, we can only handle up to four requests per second.

Redis provides a Pipeline method that can improve the performance of the above use cases, see below.

Redis Pipeline interaction model

It can be seen that the client first writes the executed command into the buffer (memory), and finally sends Redis at one time.

The pipeline packages a batch of commands and sends them to the server. The server packages and returns them in order after execution, which reduces the time for frequent interactive round-trips and improves performance.

basic use

Pipeline pipeline =jedis.pipelined();
// 循环添加 1000个元素
for(int i = 0; i < 1000; i++){
    pipeline.rpush("rediskey", i + "");
        }
//执行 
pipeline.sync()
复制代码

It's still very simple to use

The essence of Pipeline

We deeply analyze the process of a request interaction, the real situation is that it is very complex

The above figure is a complete request interaction flow chart:

  1. The client calls write to write the message to the send buffer allocated by the operating system kernel for the socket.
  2. The system kernel sends the contents of the buffer to the network card, and the network card hardware routes the data to the server's network card
  3. The server network card puts the data into the receive buffer allocated by the kernel for the socket.
  4. The server calls read to fetch the message from the receive buffer for processing
  5. The server calls write to send the response content to the send buffer
  6. The server kernel sends the content of the buffer to the client's network card through routing
  7. The client kernel puts the data in the network card into the receive buffer.
  8. The client calls read to read data from the buffer

Summarize

We began to think that the write operation would not return until the other party received the message, but this is not the case.

The real time-consuming of write IO operations

The write operation is only responsible for writing data to the send buffer of the native operating system kernel and then returning. The rest is left to the operating system kernel to asynchronously send the data to the target machine. But if the send buffer is full, then you need to wait for the buffer to free up free space, which is the real time-consuming of the IO operation of the write operation.

The real time-consuming of read IO operations

We started thinking that the read operation was pulling data from the target machine, but that's not the case. The read operation is only responsible for getting the data out of the receive buffer of the native operating system kernel. But if the buffer is empty, then you need to wait for the data to arrive, which is the real time-consuming of the IO operation of the read operation.

value = redis.get(key) operation takes time

For a simple request such as value = redis.get(key), the write operation takes almost no time, and it returns directly to the send buffer, while read is more time-consuming because it has to wait for the message to be routed through the network to The response message processed by the target machine can be returned only when it is sent back to the current kernel read buffer.

The real time consuming of the pipeline

For pipelines, continuous write operations are not time-consuming at all. After that, the first read operation will wait for a network round-trip overhead, and then all response messages have been sent back to the kernel's read buffer, and subsequent read operations are directly You can get the result from the buffer and return it instantly.

Advantages and disadvantages

Advantages of Pipeline:

By packaging commands and executing them at one time, pipeline can save the round-trip time generated by the process of connecting -> sending commands -> returning results, and reduce the number of I/O calls (switching between user mode and kernel mode).

Disadvantages of Pipeline:

  • The pipeline can not pack too many commands in each batch, because the pipeline method packs the commands and sends them, so redis must cache the processing results of all the commands before processing all the commands. This has a memory consumption.
  • Pipeline does not guarantee atomicity. During command execution, if an exception occurs in one command, other commands will continue to be executed. Therefore, if atomicity is required, pipeline is not recommended.
  • The pipeline can only act on one Redis node at a time (the reason will be explained below)

Applicable scene

Some systems may have high reliability requirements. Each operation needs to know immediately whether the operation is successful and whether the data has been written to redis. This scenario is not suitable.

In some systems, data may be written to redis in batches, allowing a certain percentage of write failures, then this scenario can be used. For example, 10,000 entries enter redis at a time, and 2 may fail. It doesn't matter, there is a compensation mechanism in the later stage. span

For example, in the scenario of mass message sending, if you send 10,000 messages at once and implement it according to the first mode, it will take a long time for the client to respond to the request, and the delay will be too long. If the client requests a timeout of 5 Seconds, then an exception will definitely be thrown, and the real-time requirements of mass text messages are not so high, so it is best to use pipeline at this time.

Recommendations

Although Pipeline is easy to use, the number of commands for each Pipeline assembly cannot be uncontrolled. Otherwise, the amount of data in a Pipeline assembly will be too large, which will increase the waiting time of the client on the one hand, and cause certain network congestion on the other hand. The pipeline of a large number of commands is split into multiple smaller pipelines to complete

Pipeline Stress Test

Redis comes with a stress testing tool redis-benchmark, which can be used for pipeline testing.

Tips: official redis-benchmark documentation: redis.io/topics/benc…

First, we perform a stress test on an ordinary set command, and the QPS is about 5w/s.

> redis-benchmark -t set -q
SET: 51975.05 requests per second
复制代码

We add the pipeline option -P parameter, which indicates the number of parallel requests in a single pipeline. See below P=2, and the QPS reaches 9w/s.

> redis-benchmark -t set -P 2 -q
SET: 91240.88 requests per second
复制代码

Looking at P=3 again, the QPS reaches 10w/s.

SET: 102354.15 requests per second
复制代码

other problems

Why pipeline can only work on one Redis node, that is, pipeline cannot be used in cluster mode?

We know that the key space of Redis cluster is divided into 16384 slots (slots), and each master node is responsible for processing part of the 16384 hash slots.

The specific redis command will calculate a slot (slot) according to the key, and then perform operations on a specific node redis according to the slot. As follows:

master1(slave1): 0~5460
master2(slave2):5461~10922
master3(slave3):10923~16383
复制代码

The cluster consists of three master nodes, of which master1 is allocated 0 5460 slots, master2 is allocated 5461 10922 slots, and master3 is allocated 10923~16383 slots.

A pipeline will execute multiple commands in batches, then each command needs to operate a slot (CRC16.getSlot(key)) according to the key, and then execute the command on a specific node according to the slot, that is to say, a pipeline operation will use Redis connection of multiple nodes, which is currently unsupported

Tips: If you don't know Redis cluster knowledge, you can refer to: redis.io/topics/clus…

What is the difference between pipeline and batch operations such as mget and mset?

mget and mset are also similar to pipelines, which execute multiple commands at one time and send them at one time, saving network time.

The comparison is as follows:

  • mset, mget operation is an atomic operation in Redis queue, pipeline is not atomic operation
  • mset, mget operate a command corresponding to multiple key-value pairs, and pipeline is multiple commands
  • mset, mget is implemented by the server, and pipeline is completed by the server and the client

What is the difference between pipeline and transaction?

Pipelines focus on RTT time, while transactions focus on consistency

  • The pipeline is a request, the server executes sequentially, returns once, and the transaction requests multiple times (MULTI command + other n commands + EXEC command, so at least 2 requests), the server executes sequentially, and returns once.
  • In cluster mode, when using pipeline, the slot must be correct, otherwise the server will return an error of redirecred to slot xxx; at the same time, it is not recommended to use transactions, because it is assumed that the commands in a transaction are executed on Master A and also on Master B executes, A succeeds, and B fails for some reason, so the data is inconsistent. This is similar to a distributed transaction and cannot guarantee absolute consistency.

Does Pipeline have a limit on the number of commands?

There is no limit, but the packaged commands cannot be too many, the greater the consumption of memory.

How many commands are appropriate for Pipeline to package and execute?

Query the official Redis documentation. According to the explanation of the official documentation, the recommended batch is 10k (note: this is a reference value, please adjust it according to your actual business situation).

When Pipeline is executed in batches, will other applications be unable to read and write?

Redis adopts a multi-channel I/O multiplexing model and non-blocking IO, so when Pipeline writes in batches, it does not affect other read operations within a certain range.

At last

Share a practical technical article every day, which is helpful for interviews and work

reference:

Official website: redis.io/topics/pipe…

Book: Deep Adventures in Redis: Core Principles and Applied Practices


Author: Programmer Duan Fei
Link: https://juejin.cn/post/7089081484958679077
Source: Rare Earth Nuggets
The copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Guess you like

Origin blog.csdn.net/wdjnb/article/details/124459618