[Redis] In-depth understanding of Redis persistence mechanism - RDB and AOF


1. Redis persistence

Although Redis is an in-memory database, data in memory may be lost due to various reasons, such as restart of the Redis server, unexpected crash, etc. In order to ensure the persistence and reliability of data, Redis introduces a persistence mechanism, which allows data to be saved to disk regularly so that the original data can be restored to memory the next time Redis is started.

The persistence mechanism of Redis is an important part of the Redis database, which allows data in memory to be written to disk in different ways to prevent data loss. This is critical for many application scenarios, especially systems that need to preserve data for a long time or have high availability requirements.

In this article, we will delve into the two main persistence mechanisms of Redis, namely RDB (Redis DataBase) and AOF (Append-Only File), as well as how they work, their advantages and disadvantages, and how to configure them. In addition, you will be introduced to the concept of hybrid persistence, which is a way to combine RDB and AOF to take full advantage of their respective advantages. Through reading this article, I hope it can help readers better understand the Redis persistence mechanism so that they can be correctly configured and managed according to actual application requirements.

2. RDB persistence mechanism

2.1 Understanding of RBD

RDB concept

RDB (Redis DataBase) is a persistence method of Redis. It is used to save the current data in memory to a snapshot file on the hard disk . This snapshot file contains all data at a specific moment, including key-value pairs, data structures, etc. RDB works like a backup of a database by periodically saving in-memory data to a persistent file for data recovery when needed.

Advantages and disadvantages of RDB persistence mechanism

advantage:

  1. High performance: RDB persistence mechanism performs well in performance. Because RDB is forkcompleted through a child process, the main process does not need to perform heavy IO operations , which ensures that Redis maintains high throughput when generating RDB snapshots.

  2. Compact and readable: RDB files are in an efficient binary format that holds data well and are very compact. This makes RDB files save disk space and be highly readable, making backup and migration easier .

  3. Suitable for disaster recovery: RDB files are complete database snapshots that can be used to recover data from disasters, such as hard drive failure or irreversible data corruption.

  4. Backup and Migration: Due to the compactness and readability of RDB files, it is ideal for backing up data or migrating Redis data between different environments.

  5. Save disk space: You can configure the saving frequency of RDB as needed, thereby controlling the disk space usage to a certain extent.

shortcoming:

  1. Data may be lost: RDB is a snapshot file that is generated regularly. If the Redis server crashes between snapshots, the data after the last snapshot may be lost.

  2. Not suitable for large-scale data: When processing large-scale data, generating RDB files may cause long-term blocking and affect performance. In some cases, blocking times may become unacceptable.

  3. Not suitable for real-time persistence needs: RDB is based on snapshots and therefore cannot provide real-time persistence. If you need every write operation to be persisted to disk immediately, RDB may not be the best choice.

  4. Configuration file modification: If the Redis configuration file is modified frequently, RDB may become unsuitable because it will only generate snapshots when the Redis server is shut down normally. This may result in data loss after configuration changes.

In summary, the RDB persistence mechanism performs well in terms of performance, backup, and compactness and is suitable for many usage scenarios, but it should be noted that data may be lost and needs to be carefully considered when handling large-scale data. If you have higher requirements for real-time persistence, you can consider using the AOF persistence mechanism or hybrid persistence to make up for the shortcomings of RDB.

RDB related configuration

Redis's RDB persistence can be configured in the Redis configuration file (usually redis.conf). The following are some common configuration items related to RDB persistence:

  1. How often to save snapshots:

    save 900 1        # 表示在900秒(15分钟)内,如果至少有1个键发生变化,就会触发RDB快照保存。
    save 300 10       # 表示在300秒(5分钟)内,如果至少有10个键发生变化,就会触发RDB快照保存。
    save 60 10000     # 表示在60秒内,如果至少有10000个键发生变化,就会触发RDB快照保存。
    

    These configuration items define the triggering conditions for RDB persistence. According to specific application requirements, RDB snapshots can be triggered based on different time intervals and the number of key changes.

  2. RDB file name and path:

    dbfilename dump.rdb    # 指定RDB快照文件的名称
    dir /var/lib/redis     # 指定RDB文件的保存路径
    

    These configuration items allow you to specify the name and save path of the RDB snapshot file. Note that you need to ensure that the folder exists and has the appropriate permissions.

  3. Disable RDB automatic persistence:

    save ""              # 禁用自动RDB持久化
    

    If you want to disable automatically triggered RDB persistence, you can saveset the configuration item to an empty string.

  4. RDB persistent compression:

    rdbcompression yes  # 开启 RDB 文件压缩(默认情况下是开启的)
    

    This configuration item controls whether to compress the generated RDB file. Turning it on can reduce the size of the RDB file, but it will occupy additional CPU resources.

  5. Checkpoint file:

    rdbchecksum yes     # 在保存RDB文件时是否进行校验和检查(默认情况下是开启的)
    

    This configuration item controls whether to perform checksum checks when saving RDB files to ensure file integrity.

Please note that for configuration changes to take effect, the Redis server needs to be restarted. Depending on the application needs and data volume, these configuration items can be adjusted to meet performance and durability requirements.

2.2 Trigger timing of RDB

2.2 Trigger timing of RDB

The triggering timing of RDB is divided into two ways: automatic triggering and manual triggering:

Automatic trigger

Automatic triggering is achieved through the frequency of saving snapshots in the configuration file. In the Redis configuration file (redis.conf), you can set one or more savedirectives to define when to automatically trigger the saving of RDB snapshots. Each savecommand has two parameters, the first parameter is the time interval in seconds, and the second parameter is the number of keys that changed.

For example, here is an example of an auto-triggered configuration for saving a snapshot:

save 900 1        # 表示在900秒(15分钟)内,如果至少有1个键发生变化,就会触发RDB快照保存。
save 300 10       # 表示在300秒(5分钟)内,如果至少有10个键发生变化,就会触发RDB快照保存。
save 60 10000     # 表示在60秒内,如果至少有10000个键发生变化,就会触发RDB快照保存。

These configuration items define the conditions under which RDB automatically triggers snapshot saving. Redis will check these conditions regularly, and if any of them is met, it will trigger the generation of RDB snapshot.

Of course, if configured save "", automatic triggering will be disabled.

Manual trigger: SAVE and BGSAVE

Manually triggering RDB snapshots is achieved through Redis commands. There are two main commands available:

  1. SAVE command: Use the SAVE command to immediately generate an RDB snapshot, which will run on 阻塞the Redis server until the RDB snapshot generation is completed. 这个命令不常用,因为它会导致 Redis 服务器在生成快照期间停止响应其他请求.

grammar:

SAVE
  1. BGSAVE command: Use the BGSAVE command to generate RDB snapshots in the background without blocking the normal operation of the Redis server. This is the generally recommended manual triggering method.
BGSAVE

The BGSAVE command uses system calls provided by the operating system forkto start a child process that is responsible for generating RDB snapshot files, while the main process continues to respond to other requests. Once the generation is complete, BGSAVE saves the RDB file to disk and then notifies the main process.

The operation process of BGSAVE command:

illustrate:

  1. When executing the BGSAVE command, the Redis parent process will first determine whether there are other executing child processes, such as child processes executing RDB or AOF related commands. If there are, the BGSAVE command at this time will return directly.
  2. The parent process will execute to forkcreate the child process, and the parent process will be blocked during the fork process. Through info statsthe command view latest_fork_usecoption, you can get the time taken for the latest forkoperation, in microseconds.
  3. After the execution of the parent process forkis completed, the BGSAVE command will return the "Background saving started" message. After that, the parent process will no longer be blocked and can continue to respond to other commands. At the same time, the child process is responsible for generating RDB snapshot files.
  4. When the child process creates an RDB file, it will generate a temporary snapshot file based on the memory of the parent process. After completion, the original dump.rdbfile will be atomically replaced. Executing lastsavethe command can obtain the last time RDB was generated, corresponding to the infostatistical rdb_last_save_timeoption.
  5. After the child process completes creating the RDB file, it will send a signal to the parent process to indicate completion, and the parent process will update the statistics.

Manually triggered RDB snapshots are usually used to back up Redis data, manually manage persistence strategies, or create consistent snapshots of data before performing certain operations. In short, the triggering timing of RDB can be achieved in two ways: automatic triggering and manual triggering, depending on the specific application requirements and management strategies. Automatic triggering is a mechanism for periodic saving, while manual triggering allows you to create RDB snapshots immediately when needed.

2.3 Processing of RDB files

The RDB file is a snapshot file of the Redis database, which saves the current data in memory so that it can be restored when needed. The following is important information about the handling and management of RDB files:

Save RDB file

  1. File saving path: RDB files are saved in the directory specified in the Redis configuration file, by default /var/lib/redis/. The file name is specified by the parameters in the configuration file dbfilenameand defaults to "dump.rdb". CONFIG SETYou can use the command to dynamically change the saving directory and file name while Redis is running , for example:

    CONFIG SET dir /new/directory/
    CONFIG SET dbfilename newdump.rdb
    

    This will save the RDB file to a new directory and use a new file name.

  2. Manual save: The save of the RDB file can be manually triggered by executing SAVEor command. The command will block other requests while the Redis server generates the RDB file, while the command will generate the RDB file in the background without blocking other operations.BGSAVESAVEBGSAVE

      SAVE
    BGSAVE
    

Compress RDB files

Redis uses the LZF algorithm by default to compress the generated RDB files to reduce the file size and save disk space and network bandwidth. Compression is enabled by default, but we can modify it dynamically with the following command:

CONFIG SET rdbcompression yes   # 开启RDB文件压缩
CONFIG SET rdbcompression no    # 禁用RDB文件压缩

Although compressing RDB files consumes CPU resources, it is generally recommended to turn it on, especially when disk space and network bandwidth are limited, as it can significantly reduce the size of the file.

Verify RDB file

Redis will load the RDB file when it starts. If the file is damaged or has the wrong format, Redis will refuse to start. In order to check the integrity and validity of RDB files, you can use the tools provided by Redis redis-check-rdb. This tool will detect RDB files and generate corresponding error reports to help us identify and solve problems.

Check the RDB file:

Processing and managing RDB files is an important task to ensure the durability and reliability of Redis data. Understanding how to save, compress and verify RDB files can help us better manage Redis databases.

3. AOF persistence mechanism

3.1 Understanding AOF

The concept of AOF

AOF (Append-Only File) is a persistence mechanism of Redis, which is used to 写操作以追加的方式记录到一个文本文件中implement data persistence on the Redis server. Each write operation is appended to the end of the AOF file in the form of a Redis command, so the AOF file is a log file that records write operations in chronological order.

Advantages and Disadvantages of AOF

advantage:

  1. Data security : AOF records every write operation, so in the event of a server failure or crash, only the data after the last write operation will be lost. This provides higher data security.

  2. Readability : An AOF file is a text file that is easy for humans to read and understand. This makes AOF files very useful when you need to manually recover data or do debugging.

  3. Flexibility : The append mode of writing AOF files makes data persistence very real-time. Different synchronization strategies can be selected according to needs, from fully synchronous to asynchronous.

  4. Automatic rewriting : Redis provides an automatic rewriting mechanism for AOF files, which can prevent AOF files from being too large and improve performance.

shortcoming:

  1. File size : Compared to RDB persistence, AOF files are usually larger because it contains the command text for each write operation. This may take up more disk space.

  2. Write performance : AOF persistence will cause each write operation to be appended to the AOF file, which may have a certain impact on write performance, especially when using a synchronous write strategy.

AOF related configuration

In Redis, you can CONFIGconfigure the parameters related to AOF persistence through the configuration file or using the command. The following are some commonly used AOF configuration options:

  • appendonly: Used to enable or disable AOF persistence. Set to "yes" to enable AOF, and to "no" to disable it. By default, AOF persistence is disabled and needs to be enabled manually.

  • appendfilename: Specify the file name of the AOF file, the default is "appendonly.aof". You can change the file name as needed.

  • appendfsync: Specify the strategy for synchronizing AOF files to disk. Options available include "always" (synchronize every write), "everysec" (synchronize once per second), and "no" (fully asynchronous).

  • auto-aof-rewrite-percentageand auto-aof-rewrite-min-size: used to configure the trigger conditions for AOF automatic rewriting. The former indicates the percentage by which the size of the AOF file has increased relative to the size of the last rewrite when the rewrite is triggered, and the latter indicates the minimum size of the AOF file.

These configuration options can be found in Redis's configuration file (usually redis.conf) or can CONFIGbe set dynamically via commands. Based on your needs and application scenarios, AOF persistence can be flexibly configured to achieve data persistence and performance requirements.

3.2 Use of AOF

Use the demo

After configuring AOF related options, you need to use the command service redis-server restartto restart the Redis server. After restarting, you will find that dump.rdbthere is an additional file in the same directory as the file appendonly.aof:


Set two pieces of data into Redis:

View appendonly.aoffile:

The content of the file is the command to write data just now.

Abruptly terminate the Redis server and then start it again:

connect and query the Redis database and find that the data is not lost:

AOF workflow

The workflow diagram is as follows:

  1. All write commands are appended to aof_bufthe buffer.
  2. The AOF buffer performs synchronization operations to the hard disk according to the corresponding policy.
  3. As AOF files become larger and larger, AOF files need to be rewritten regularly to achieve compression.
  4. When the Redis server starts, the commands in the AOF file are loaded for data recovery.

Command writing:
The content written by the AOF command directly follows the text protocol format of Redis. For example, set hello worldthe command is represented in text protocol format in the AOF buffer as follows:

*3\r\n$3\r\nset\r\n$5\r\nhello\r\n$5\r\nworld\r\n

The reasons why Redis chooses the text protocol include good compatibility, simple implementation, and readability.

Why use aof_bufbuffer:

The AOF buffer (aof_buf) exists to improve write performance. Because Redis responds to commands in a single thread, if every write is directly synchronized to the hard disk, performance will be seriously affected. By first writing commands to the buffer, Redis can effectively reduce the number of IO operations and improve performance. In addition, Redis also provides a variety of buffer synchronization strategies, allowing users to make trade-offs based on performance and data security requirements.

3.3 AOF synchronization file strategy

Redis provides a variety of AOF buffer synchronization file strategies, which are controlled by redis.confparameters in the configuration file ( ) appendfsync. The configurable values ​​and their descriptions are as shown in the following table:

Configurable values illustrate
always Each pass in writewrite aof_bufwill force an immediate call fsyncto synchronize the data to disk, ensuring data security, but poor write performance.
everysec By writefirst appending to the operation aof_buf, and then performing a synchronization operation every second, the data is synchronized into the AOF file to balance performance and data security.
no Only write operations are writeappended to by aof_buf, synchronization operations are controlled by the operating system. This configuration provides the highest write performance but lower data security.

Notes on system calls writeand :fsync

  • writeThe operation triggered the delayed write mechanism. The Linux kernel provides page buffers to improve hard disk I/O performance. writeThe operation returns immediately after writing the data to the system buffer. The operation of synchronizing to disk relies on the operating system's scheduling mechanism, for example, when the buffer page is full or a certain time interval is reached. If the system crashes or crashes before synchronizing the files, the data in the buffer may be lost.

  • fsyncIt is an operation for a single file. It performs forced hard disk synchronization and fsyncblocks until the data is written to the hard disk.

Different configuration values:

  • alwaysWhen configured , every write forces the AOF file to be synchronized, which results in poor performance. On a general SATA hard disk, it can only support a few hundred TPS writes. It is generally not recommended to configure this alwaysunless the data is very important.

  • noWhen the configuration is incorrect , since the operating system synchronization policy is not controlled, although the performance is improved, the risk of data loss is greatly increased. It is generally not recommended to configure this nounless the importance of the data is very low.

  • Configured to everysecis the default configuration and the recommended configuration option, which takes into account data security and performance. In theory, only 1 second of data will be lost at most.

These configuration options allow users to choose a suitable AOF synchronization strategy based on performance and data security requirements.

3.4 AOF rewriting mechanism

Why rewrite AOF files

The AOF file is rewritten to solve the problem of a large number of invalid commands that may exist in Redis. Let's illustrate with some simple examples:

In Redis, the following series of write commands are executed:

SET key1 123
SET key1 hello
SET key1 world
SET key1 123

Although 4 write commands were executed above, only the data written by the last command exists in Redis, that is, the key1value is 123.

Likewise, the following series of write commands were executed:

SET key2 123
SET key2 hello
SET key2 world
SET key2 123
DEL key2

Although 5 write commands were executed above, they key2did not exist in Redis in the end.

  • For example, execute the following command in Redis:

  • Before rewriting:

  • After rewriting:

At this time, it was found that garbled characters appeared, that is, they were stored in binary form. The reason for this was that AOP and RDB were enabled at the same time, so the persistence strategy adopted by Redis at this time was hybrid persistence.

  • Use the checking tool provided by Redis redis-check-aofto check the AOF file and you can find that the AOF file is legal:

It can be concluded from the above two simple examples:

  • Without the AOF file rewriting mechanism, Redis will continue to store a large number of invalid commands, which will cause the AOF file to become large, occupy a lot of disk space, and reduce the reading performance of the AOF file.
  • The rewrite of the AOF file solves this problem by cleaning up these invalid commands and generating a new AOF file that is more compact and efficient. The new AOF file only contains valid write commands, which helps reduce file size, improve read performance, and ensure that no valid data is lost. This is the reason why AOF files need to be rewritten.

How to trigger AOF rewriting

Rewriting of AOF files can be triggered in the following two ways:

  1. Manual triggering : By calling the command provided by Redis BGREWRITEAOF, the rewriting process of the AOF file can be manually triggered. This command will immediately start rewriting the AOF file without affecting the normal operation of Redis.

  2. Automatic triggering : Redis also supports automatic triggering of rewriting of AOF files. The triggering timing is determined based on two configuration parameters:

    • auto-aof-rewrite-min-size: This parameter defines the minimum size of the AOF file, in MB. The default value is 64MB. Only when the size of the AOF file exceeds this threshold will automatic rewriting be considered.

    • auto-aof-rewrite-percentage: This parameter defines the growth ratio of the AOF file size relative to the last rewrite, expressed as a percentage. The default value is 100%. If the size of the AOF file grows beyond this percentage since the last rewrite, automatic rewrite will be triggered.

    For example, if auto-aof-rewrite-min-sizeset to 64MB and auto-aof-rewrite-percentageset to 100%, automatic rewrite will only be triggered if the size of the AOF file exceeds 64MB and has grown by 100% or more since the last rewrite.

Automatically triggering rewriting of AOF files is to ensure that AOF files do not grow without limit and to avoid performing rewriting operations too frequently. This can keep the AOF file at a reasonable size without losing any data, improve performance, and reduce disk space usage.

AOP rewritten execution flow


The execution flow of AOP rewriting is as follows:

  1. Perform AOF rewrite request.

    • If the current process is performing AOF rewriting, the request is not executed.
    • If the current process is performing a bgsave operation, the rewrite command will be delayed until bgsave completes.
  2. The parent process executes fork to create a child process.

  3. Rewriting process:
    a. The main process continues to respond to other commands after fork. All modification operations will be written to the AOF buffer and synchronized to the hard disk according to the appendfsync policy to ensure the correctness of the old AOF file mechanism.
    b. The child process only has all the memory information before the fork, so the parent process needs to write the modification operations during the period after the fork into the AOF rewrite buffer.

  4. The child process merges the commands into a new AOF file based on the memory snapshot.

  5. The child process completes the rewriting:
    a. After the new file is written, the child process sends a signal to the parent process.
    b. The parent process appends the commands temporarily saved in the AOF rewrite buffer to the new AOF file.
    c. Replace the old AOF file with the new AOF file.

4. Mixed persistence

Hybrid persistence means that Redis uses both AOF (Append-Only File) and RDB (Redis Database) persistence methods to ensure data durability. This strategy combines two different persistence methods to account for different needs and scenarios.

Here's how hybrid persistence works and its advantages:

1. AOF persistence:

  • AOF is an append write log file that records the commands for each write operation and the parameters of these commands.
  • AOF persistence is real-time, that is, every write operation is appended to the AOF file, ensuring data integrity.
  • AOF files can be used for data recovery, and Redis can reconstruct the data state by re-executing the commands in the AOF files.

2. RDB persistence:

  • RDB is a snapshot persistence that saves data in Redis memory to disk in binary form.
  • RDB persistence is point-to-point, that is, a snapshot file is generated within a specified time interval, usually after the memory data changes.
  • An RDB file is a compressed binary file used to quickly restore the state of the entire database when needed.

How hybrid persistence works:

  • Redis enables both AOF and RDB persistence methods.
  • AOF files record every write operation, while RDB files generate snapshots at specific intervals.
  • During data recovery, Redis will first try to use the AOF file for recovery, because it contains more historical commands and can provide more complete data recovery.
  • If the AOF file is damaged or too large, Redis can use the RDB file for quick startup, and then perform data repair based on the AOF file.

Advantages of hybrid persistence:

  • Data security: AOF provides real-time data appending and historical records, making data more secure. RDB provides a fast recovery method.
  • Performance: AOF write operations are more efficient than RDB, and RDB provides fast data recovery.
  • Flexibility: Hybrid persistence allows you to choose the appropriate recovery method based on different needs and scenarios.
  • Reduce data loss: Since AOF files contain historical operations, the possibility of data loss can be reduced.

In short, hybrid persistence combines the two persistence methods of AOF and RDB, takes into account real-time performance and recovery, provides a more comprehensive data protection strategy, and makes Redis more powerful and reliable in different application scenarios. However, it should be noted that hybrid persistence may increase storage overhead and disk I/O load, so careful consideration is required during configuration.

Guess you like

Origin blog.csdn.net/qq_61635026/article/details/132892150