Redis persistence and master-slave replication

Why do we need persistence

Redis is a memory-based NoSQL database. The read and write speed is naturally fast, but the memory is instantaneous. After the redis service is shut down or restarted, the data stored in the memory by redis will be lost. In order to solve this problem, redis provides two types of persistence. Way to recover data after a failure.

Persistence options

Redis provides two different persistence methods to store data to the hard disk. One is the snapshot method (also called the RDB method), which can store all the data that exists in redis at any moment to the hard disk; the other is the append-only file (AOF) method, which regularly copies all the data executed by redis Write commands to the hard disk. These two persistence methods have their own merits, and can be used at the same time or independently, and in some cases neither can be used.

RDB way

The RDB method is also called the snapshot method, which saves a copy of the data (.rdb) at a certain point in time to the hard disk by creating a snapshot. After restarting the server, redis will load the RDB file to restore the data. Let's take a look at the RDB persistence configuration.
vi redis.confOpen the redis configuration file, find the SNAPSHOTTING section, and find the following:

save 900 1
save 300 10
save 60 10000
……
dbfilename dump.rdb
dir ./

Description

  1. save seconds changes: Indicates that after seconds, if there are many changes with the changes key, a snapshot will be saved. As you can see, rdb persistence is enabled by default, and three save options are configured. If you want to turn off rdb persistence, just comment out all saves.
  2. dbfilename: rdb file name
  3. dir: RDB file storage path

Create a snapshot

BGSAVE: The
BGSAVE command can be used to create a snapshot. After redis receives the BGSAVEcommand, a child process will fork out. The child process is responsible for writing the snapshot to the hard disk, while the parent process continues to process the command request. It should be noted that redis will block the parent process when creating a child process, and the length of time is proportional to the memory size occupied by redis.
In addition to manually invoking BGSAVEcommands, BGSAVEthere are two trigger conditions for commands as follows:

  1. The user has configured the save option. Starting from the last snapshot created by redis, a BGSAVEcommand will be triggered when the conditions of any save option are met .
  2. During the master-slave replication connection, the newly connected slave server will send a SYNCcommand to the master server to request data synchronization. After the master server receives the SYNCcommand, it will execute the BGSAVEcommand once , and then send the generated rdb file to the slave server for data synchronization .

SAVE:
SAVE Commands can also create a snapshot, but unlike BGSAVEcommands, SAVEcommands do not create child processes, so SAVEthe redis server that receives the command will not respond to any other commands until the snapshot is created. Since no other processes grab resources during the process of creating a snapshot, the SAVEcommand to create a snapshot will be faster than the BGSAVEcommand to create a snapshot. Even so, SAVEcommands are not commonly used, and are usually only used when there is not enough memory or waiting for the snapshot to be completed.
For example, when redis receives a SHUTDOWNcommand to close the service, it will execute the SAVEcommand once , block all clients, and SAVEclose it after the command is executed.

The advantages and disadvantages of the RDB method

Advantage:

  1. Use only one file to back up data, easy to restore after a disaster
  2. Compared with aof, the rdb file is smaller, and loading the rdb file to restore data is also faster

Disadvantages:

  1. If the redis service is shut down or restarted due to a failure, the data written after the most recent snapshot was created will be lost
  2. When the amount of data is large, creating a child process will cause redis to pause for a long time

AOF method

Simply put, AOF persistence will write the executed write command to the end of the aof file to record the changes in the data. Therefore, redis can restore the data as long as all the write commands contained in the aof file are executed again from the beginning to the end.

Open the redis configuration file to see:

# 是否开启aof持久化,默认为关闭(no)
appendonly yes
# 设置对aof文件的同步频率
# 每接收到一条写命令就进行一次同步,数据保障最有力,但对性能影响十分严重
appendfsync always
# 每秒进行一次同步,推荐
appendfsync everysec
# 由操作系统来决定何时进行同步
appendfsync no
# 重写aof相关
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

Rewrite/compress AOF files

Since aof persistence will continuously record redis write commands, as redis runs, the aof file will become larger and larger, occupying too much hard disk space, and increasing the time for redis to perform data restoration operations. Therefore, it is necessary to have a control scheme to avoid excessively large AOF files.

BGREWRITEAOFRedis provides commands to rewrite the AOF file, and BGREWRITEAOFwill reduce the size of the AOF file as much as possible by removing redundant commands in the original AOF file. BGREWRITEAOFThe working principle is BGSAVEvery similar. Redis will create a child process, and then the child process will rewrite the aof file.

Of course, BGREWRITEAOFcommands also have an automatic trigger mechanism, which can be automatically executed through configuration auto-aof-rewrite-percentageand auto-aof-rewrite-min-size. For example, if auto-aof-rewrite-percentage 100 and auto-aof-rewrite-min-size 64mb are configured, and aof persistence is turned on, then the aof file size is greater than 64mb and the current file is larger than the file volume after the last rewrite When it is more than double (100%), redis will automatically execute the BGREWRITEAOFcommand.

The pros and cons of AOF persistence

Advantage

  1. The time window for losing data can be reduced to 1 second, and it will not have much impact on performance
  2. Aof uses the append mode for the log file, so even if there is a downtime during the writing process, it will not destroy the existing content in the log file; if only half of the data is written, it will be down. When redis is started next time, redis-check-aodTools can be used to solve the problem of data consistency

Disadvantage

  1. The size of the AOF file has always been the biggest flaw in AOF persistence, even if there is a mechanism to rewrite the AOF file.
  2. The process of loading the aof file to recover data will take longer than loading the rdb file

Master-slave replication

Although redis has excellent performance, it still encounters the problem of not being able to process requests quickly. In order to resist database performance problems caused by high concurrency, redis can perform master-slave replication and read-write separation like a relational database. That is, write data to the master server, receive updates from the server in real time, and use the slave server to process all read requests, instead of sending all read requests to the master server as before, causing excessive pressure on the master server, usually read requests It will randomly choose which slave server to use, so that the load is evenly distributed to each slave server. The figure below is a simple redis master-slave architecture.

Master-slave replication configuration

First execute in your redis directory, vi redis6380.confcreate a redis configuration file in the current directory, and write the following:

include /usr/local/redis-4.0.13/redis.conf
port 6380
pidfile /var/run/redis_6380.pid
logfile 6380.log
dbfilename dump6380.rdb

Description:

  1. include: Introduce the configuration information of the pointed configuration file into the current configuration file. The redis default configuration file is introduced here. The remote access, password, etc. have been set, and there is no need to reset it in the new configuration file. For configuration information that needs to be reconfigured (such as port number), the configuration under the include line can override the referenced configuration.
  2. port: Port number, our master and slave servers are running on the same virtual machine, so we need to configure different port numbers.
  3. pidfile: Custom pid file, the pid of the background program is stored in this file.
  4. logfile: log file.
  5. dbfilename: The name of the rdb file.

After the above operations, a new master server is configured, then configure the slave server, and also create a redis configuration file named redis6382 in the current directoryvi redis6382.conf

include /usr/local/redis-4.0.13/redis.conf
port 6382
pidfile /var/run/redis_6382.pid
logfile 6382.log
dbfilename dump6382.rdb
slaveof 127.0.0.1 6380
masterauth 主服务器的密码

There are some additional configurations from the server:

  1. slaveof: Indicates whose slave server I am, I need to develop the ip address and port number of the master server
  2. masterauth: If your master server is configured with a password, you need to configure it here, otherwise the slave server will not be able to connect to the master server

The configuration of other slave servers is similar. Pay attention to the allocation of port numbers. I have configured a 6384 here.
After the configuration is successful, use ./redis-server ../redis6380.confit in the src directory to start the master server, and then start the slave server and automatically connect to the master server. Pay attention to specifying the corresponding configuration file.
If ps -ef | grep redisyou see the following content, it means that the master-slave server has started successfully:

root      2625     1  0 16:15 ?        00:00:00 ./redis-server *:6380
root      2630     1  0 16:15 ?        00:00:00 ./redis-server *:6382
root      2636     1  0 16:15 ?        00:00:00 ./redis-server *:6384

After the master and slave servers are started, enter the client of the master server ./redis-cli -p 6380 -a 你的密码, and execute info replicationto view the information of the master and slave servers, as follows

127.0.0.1:6380> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6382,state=online,offset=336,lag=1
slave1:ip=127.0.0.1,port=6384,state=online,offset=336,lag=1
master_replid:b5c68a979b28d2a9ef53476510758b5d1795418b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:336
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:336

Similarly, execute the above command from the server client, you can also get information

127.0.0.1:6384> info replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:686
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:b5c68a979b28d2a9ef53476510758b5d1795418b
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:686
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:672

At this point, a redis architecture with one master and two slaves and read-write separation has been configured and successfully started.

The startup process of master-slave replication

Insert picture description here
The above figure is the startup process of the old moderator and slave Redis. The points that need special explanation are:

  1. When the slave server makes the initial connection, all the original data in the database will be lost and replaced with the data sent by the master server
  2. The slave server is not responsible for the key expiration operation, but passively accepts the command sent by the master server. When a master expires a key (or expels it due to the LRU algorithm), it will synthesize a DEL command and transmit it to all Slave
  3. SYNC is a very resource-consuming operation. During BGSAVE, the total throughput of the master server decreases, and then consumes a large amount of network resources of the master and slave servers to transmit the rdb file. When the slave server loads the rdb file, it will not be able to respond to the client's request; but The biggest flaw of SYNC is that when the slave server is disconnected and reconnected, there is no need to apply for an RDB file to load again from the beginning, because most of the data contained in this new RDB file is likely to have been written before the disconnection. From the server, at this time, the slave server only needs to get the data written during the disconnection.

Partial resynchronization

In order to make up for the shortcomings of the old version of replication, Redis has used the PSYNC command instead of the SYNC command since version 2.8. PSYNC has two modes: complete resynchronization and partial resynchronization. The complete resynchronization is similar to the previous version of synchronization, and an RDB must be sent. But partial resynchronization is amazing: it can only send write commands written to the master server during the disconnection period to the slave server, which consumes less resources and is much faster. As shown below.
Partial resynchronization
The implementation principle of partial resynchronization is not complicated and consists of three parts: copy offset (offset), copy backlog buffer and server run id (runid)

Replication offset The
replication offset is used to confirm the synchronization status of the master and slave servers. The master and slave servers each maintain a copy offset. When the master server sends N bytes of data to the slave server, it adds its own copy offset to N; the slave server receives N bytes of data. Add N to its own copy offset. The synchronization status can be easily confirmed by comparing the replication offset of the master and the slave.
Insert picture description here

Copy backlog buffer The
copy backlog buffer is a fixed-length first-in-first-out queue maintained by the master server. When the master server performs command propagation, the command will be queued to the copy backlog buffer, as follows:
Insert picture description here
due to the copy backlog The buffer is a fixed-length queue, so it only saves the write commands executed in the most recent period of time, and records the corresponding copy offset for each byte in the queue. When sending the PSYNC command from the server, it will carry its own copy offset. The master server takes this offset to check the offset+1 (the next command executed after the disconnection) in its own copy backlog buffer. Are you in the queue? If it is still there, it means that partial resynchronization can be performed, and all data from offset+1 to the end of the queue will be sent to the slave server; if it is not, the slave server can only do a full resynchronization honestly.

Server running Id The
server running Id is simply to see if the master and slave servers are in the same family before the disconnection. Each redis server has its own running id. When the master-slave connects for the first time, the master server will send its own server running id to the slave server to save it. When the slave server reconnects, it will concurrently send the previously saved master server runid. To the main server, the main server will compare this runid with its own runid. If they are the same, it means that the slave server was indeed disconnected from itself before, and then check the offset; if it is inconsistent, it means that the slave server was previously a slave of another master server, so just call to do a complete resynchronization. .

You info replicationcan see the server running id and replication offset when you execute the command before .

In summary, the synchronization process of a new version of redis replication is roughly as follows:

Guess you like

Origin blog.csdn.net/qq_52450582/article/details/114779032