Redis advanced knowledge points you have to know-master-slave replication principle

Article content

  • Master the principle and selection of Redis persistent RDB and AOF
  • Understand the principle of Redis master-slave replication
  • Able to configure Redis master-slave replication

One, Redis persistence

Redis is an in-memory database. In order to ensure the durability of data, it provides two persistence solutions:

RDB mode (default)

The RDB method is completed by snapshotting. When certain conditions are met, Redis will automatically take a snapshot of the data in the memory and persist it to the hard disk.

When to trigger the snapshot

  1. Comply with the snapshot rule redis.conf of custom configuration
  2. Execute save or bgsave command
  3. Execute flushall command
  4. Perform master-slave copy operation for the first time

Schematic diagram

Set snapshot saving rules

save how much data has changed in seconds

save "": RDB storage is not applicable

save 900 1: Indicates that at least one key is modified within 900 seconds and a snapshot is taken.

save 300 10: Indicates that at least 10 keys have been modified within 5 minutes to take a snapshot.

save 60 10000: Indicates that a snapshot will be taken if 10000 keys are changed within 1 minute.

Precautions:

  1. Redis does not modify the RDB file during the snapshot process, and only replaces the old snapshot file with the new one after the snapshot ends, which means that the RDB file is complete at any time.
  2. This allows us to back up the Redis database by regularly backing up RDB files. RDB files are passed through 压缩的二进制文件and occupy less space than the data in memory, which is more conducive to transmission.

RDB advantages and disadvantages

Disadvantages : Use the RDB method for persistence. If you understand the backup principle, it is easy to see that if Redis is abnormally down or restarted , all data modifications after the last snapshot will be lost. At this time, we need to control the possible data loss within an acceptable range by combining and setting automatic snapshot conditions according to specific application scenarios . If the data is relatively important and you want to minimize the loss, you can use AOF for persistence.

Advantages: RDB maximizes the performance of Redis. The only thing the parent process has to do when saving the snapshot to generate the RDB file is to fork a child process, and then this child process will handle all subsequent file saving work, the parent process does not need to execute any disk I/O operations. At the same time, this is also a disadvantage. If the data set is relatively large, fork may be time-consuming, causing the server to stop processing client requests for a period of time.

AOF method

Redis does not enable persistence in AOF (append only file) mode by default.

After enabling AOF persistence, every time a command that changes the data in Redis is executed, Redis will write the command to a AOFfile in the hard disk . This process will obviously reduce the performance of Redis , but in most cases this effect can be Accepted. In addition, using a faster hard disk can improve the performance of AOF.

Configure redis.conf

appendonly yes

# The name of the append only file (default: "appendonly.aof")

appendfilename "appendonly.aof"

# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
#
# The Append Only File will also be created inside this directory.
#
# Note that you must specify a directory here, not a file name.
dir ./

The above three parameters specify the opening of AOF persistence, as well as the name of the persistent file and the directory where the file is located.

principle

Before learning the principles of AOF, we must first understand RESP (Redis serialization protocol)

Stored in the AOF file is the reids command.

AOF synchronization and RDB are similar in that they are handled by fork process:

Insert picture description here

Principle of AOF rewriting (optimizing AOF file)

set s1 11
set s1 22

In the above operation, if the AOF file is not optimized before, these two commands will be serialized and stored according to RESP. If optimized, only the next command, set s1 22, will be stored. The value of the same key is overwritten and only stored Final Results.

Analysis of rewriting process

In the process of creating a new AOF file, Redis will continue to append commands to the existing AOF file. Even if a shutdown occurs during the rewriting process, the existing AOF file will not be lost. Once the new AOF file is created, Redis will switch from the old AOF file to the new AOF file and start appending the new AOF file.

Optimize trigger conditions:

# 表示当前aof文件大小超过上一次aof文件大小的百分之多少的时候会进行重写。如果之前没有重写过,以启动时aof文件大小为准
auto-aof-rewrite-percentage 100
# 限制允许重写最小aof文件大小,也就是文件大小小于64mb的时候,不需要进行优化
auto-aof-rewrite-min-size 64mb

How to choose RDB and AOF

  • In-memory database, data cannot be lost : rdb (redis database)+aof
  • Cache server : rdb
  • It is not recommended to only use aof (poor performance)
  • When recovering: if there is aof, choose aof recovery first, if not, choose rdb file recovery

Two, Redis master-slave replication

What is master-slave replication?

In short:

  • Master external slave internal, master writeable and never writeable
  • The Lord is dead, never be the Lord

See the figure below to deepen your understanding:

Master-slave configuration

Next, let's take a look at the master-slave architecture configuration of redis:

  • The main redis does not require any configuration
  • The slave needs to modify the following configuration items in the redis.conf file
port 6378  # 如果是使用的一台机器注意端口要与主机不同
# slaveof <masterip> <masterport>
# 表示当前【从服务器】对应的【主服务器】的IP是192.168.10.135,端口是6379。
slaveof 192.168.137.6 6379

Realization principle

Starting from version 2.8, Redis uses the PSYNC command instead of the SYNC command to perform synchronization operations during replication. Because this article only explains the current PSYNC synchronization principle.

PSYNC command having full synchronization (full resynchronization) and a partial synchronization (partial resynchronization) two modes:

  • The complete synchronization is used to handle the initial replication situation: the execution step of the complete resynchronization is to synchronize by letting the master server create and send the RDB file, and send the write command stored in the buffer to the slave server;

  • Partial synchronization is used to deal with re-replication after disconnection: when the slave server reconnects to the master server after the disconnection, if conditions permit, the master server can send the write commands executed during the disconnection of the master-slave server to the slave server , As long as the slave server receives and executes these write commands, it can update the database to the current state of the master server.

The following figure shows the communication process of the master-slave server during partial resynchronization:

In fact, when I saw this, there was still a question in my heart: when the slave server is offline for a long time, it is faster for you to transmit such an instruction one instruction than to directly send a SYNC command through the RDB file. So in my opinion, when using PSYNC to perform operations, when to resynchronize partially and when to resynchronize fully is a policy issue. Of course, Redis will solve this problem, so everyone continues to look at 0_0.

Partial synchronization

The partial resynchronization function consists of the following three parts:

  • The replication offset of the master server (replication offset) and the replication offset of the slave server;
  • The replication backlog of the primary server (replication backlog);
  • The server's run ID (run ID).
Copy offset

The two parties performing the replication-the master server and the slave server will maintain a replication offset:

  • Each time the master server transmits N bytes of data to the slave server, it adds N to its copy offset value;
  • Each time the slave server receives N bytes of data transmitted from the master server, it adds N to its own copy offset value;

By comparing the replication offset of the master and slave servers, the program can easily know whether the master and slave servers are in a consistent state:

  • If the master and slave servers are in a consistent state, the offsets of the master and slave servers are always the same;
  • On the contrary, if the offsets of the master and slave servers are not the same, then the master and slave servers are not in a consistent state.

Such as the following situation:

Assuming that the slave server A reconnects to the master server immediately after the disconnection and succeeds, then the slave server will send the PSYNC command to the master server, reporting that the current replication offset of slave server A is 10107 , then the master Should the server perform a complete resynchronization or partial resynchronization of the slave server? If partial resynchronization is performed, how does the master server compensate for the part of data lost by the slave server A during the disconnection? The answers to the above questions are all related to copying the backlog buffer.

Copy backlog buffer

The replication backlog buffer is a fixed-size first-in-first-out (FIFO) queue maintained by the master server. The default size is 1MB.

Unlike ordinary first-in-first-out queues that dynamically adjust the length with the increase and decrease of elements, the length of the fixed-length first-in-first-out queue is fixed. When the number of elements in the queue is greater than the length of the queue, the first element in the queue will be Pop, and the new element will be put into the queue.

When the master server performs command propagation, it not only sends the write command to all slave servers, but also queues the write command to the copy backlog buffer, as shown in the figure.

Therefore, the copy backlog buffer of the primary server will store a part of the recently transmitted write commands, and the copy backlog buffer will record the corresponding copy offset for each byte in the queue, as shown in the following table :

When the slave server reconnects to the master server, the slave server will send its replication offset offset to the master server through the PSYNC command, and the master server will decide which synchronization operation to perform on the slave server according to the replication offset:

  • If the data after the offset offset (that is, the data starting at the offset offset+1) still exists in the replication backlog buffer, the master server will perform a partial resynchronization operation on the slave server;
  • On the contrary, if the data after the offset does not exist in the copy backlog buffer, the master server will perform a complete resynchronization operation on the slave server.
Adjust the size of the copy backlog buffer as needed

The default size set by Redis for the replication backlog buffer is 1MB. If the master server needs to execute a large number of write commands, or it takes a long time to reconnect after the master and slave servers are disconnected, then this size may not be appropriate. If the size of the copy backlog buffer is set improperly, then the copy resynchronization mode of the PSYNC command cannot function normally. Therefore, it is very important to correctly estimate and set the size of the copy backlog buffer.

The minimum size of the copy backlog buffer can be estimated according to the formula second * write_size_per_second:

  • Where second is the average time (in seconds) required to reconnect to the master server after the slave server is disconnected;
  • And write_size_per_second is the average write command data volume per second generated by the main server (the sum of the length of the write command in the protocol format (RESP protocol));

For example, if the master server generates 1 MB of write data per second on average, and it takes an average of 5 seconds to reconnect to the master server after the slave server is disconnected, the size of the replication backlog buffer cannot be less than 5 MB.

For safety reasons, the size of the copy backlog buffer can be set to 2 * second * write_size_per_second , so that most disconnections can be handled by partial synchronization.

As replication backlog buffer size modification method, refer to the configuration file on repl-backlog-sizedescription of the options.

Server running ID

In addition to copying the offset and copying the backlog buffer, the server run ID (run ID) is also required to achieve partial resynchronization:

  • Each Redis server, whether the master server or the slave service, will have its own running ID;
  • The running ID is automatically generated when the server starts, and consists of 40 random hexadecimal characters, such as 53b9b28df8042fdc9ab5e3fcbbbabff1d5dce2b3;

When the slave server copies the master server for the first time, the master server will transmit its running ID to the slave server, and the slave server will save the running ID (note that the slave server saves the ID of the master server).

When the slave server disconnects and reconnects to a master server, the slave server will send the previously saved running ID to the currently connected master server:

  • If the running ID saved by the slave server is the same as the running ID of the currently connected master server, it means that the master server that is currently connected was copied before the slave server was disconnected, and the master server can continue to try to perform partial resynchronization operations;
  • Conversely, if the running ID saved by the slave server is not the same as the running ID of the currently connected master server, then the master server replicated before the slave server is disconnected is not the master server currently connected, and the master server will execute the execution on the slave server Complete resynchronization operation.
Implementation of PSYNC command

There are two ways to call the PSYNC command:

  • If the slave server has not replicated any master server before, or executed the SLAVEOF no one command before, then the slave server will send a PSYNC? -1 command to the master server when starting a new replication, actively requesting the master server to perform a complete resynchronization ( Because it is impossible to perform partial resynchronization at this time);
  • Conversely, if the slave server has already replicated a certain master server, the slave server will send a PSYNC command to the master server when starting a new replication: runid is the running ID of the master server that was replicated last time, and offset is the slave The current replication offset of the server. The master server that receives this command will use these two parameters to determine which synchronization operation should be performed on the slave server.

According to the situation, the master server that receives the PSYNC command will return one of the following three responses to the slave server:

  • If the master server returns a +FULLRESYNC reply, it means that the master server will perform a complete resynchronization operation with the slave server: runid is the running ID of the master server, and the slave server will save this ID and use it the next time the PSYNC command is sent; and offset is the current replication offset of the master server, and the slave server will use this value as its initial offset;
  • If the master server returns a +CONTINUE reply, it means that the master server will perform a partial resynchronization operation with the slave server, and the slave server only needs to wait for the master server to send the missing part of the data;
  • If the master server returns an -ERR reply, it means that the version of the master server is lower than Redis 2.8 and it cannot recognize the PSYNC command. The slave server will send the SYNC command to the master server and perform a complete synchronization operation with the master server.

Above we explained in detail how the bottom layer decides to use full synchronization or partial synchronization when redis master-slave synchronization. Let's look at the entire incremental synchronization and partial synchronization process:

Redis's full synchronization process is mainly divided into three stages:

  • Synchronous snapshot stage: Master creates and sends a snapshot to Slave, and Slave loads and parses the snapshot. The Master also stores the new write commands generated in this stage into the buffer.
  • Synchronous write buffer phase : Master synchronizes the write operation commands stored in the buffer to the Slave.
  • Synchronous incremental phase : Master synchronizes write operation commands to Slave.

Incremental synchronization

  • Redis incremental synchronization refers primarily to the write operation at the start of normal operation after the completion of initialization Slave, Master happened to synchronize Slavethe process .
  • Normally, every time the master executes a write command, it will send the same write command to the slave, and then the slave will receive and execute it.

Reference materials:

https://www.cnblogs.com/lukexwang/p/4711977.html

Guess you like

Origin blog.csdn.net/taurus_7c/article/details/104034580