Redis Advanced - Redis Persistence

The address of the original text is updated, and the reading effect is better!

Redis Advanced - Redis Persistence | CoderMast Programming Mast https://www.codermast.com/database/redis/redis-advance-persistence.html

The problem with single-point Redis

Data loss problem: Redis is an in-memory storage, and data may be lost when the service is restarted. Solved by implementing Redis data persistence.
Concurrency issue: Although the concurrency capability of single-node Redis is good, it cannot satisfy high-concurrency scenarios such as 618. Build a master-slave cluster to achieve read-write separation.
Fault recovery problem: If Redis goes down, the service is unavailable, and an automatic fault recovery method is needed. Leverage Redis Sentinel for health detection and automatic recovery.
Storage capacity problem: Redis is based on memory, and the amount of data that can be stored at a single point is difficult to meet the demand for massive data. Build a fragmented cluster and use the slot mechanism to achieve dynamic expansion.

# RDB persistence

The full name of RDB is Redis Database Backup file (Redis data backup file), also known as Redis data snapshot. Simply put, all data in memory is recorded to disk. When the Redis instance fails and restarts, read the snapshot file from the disk and restore the data.

Snapshot files are called RDB files, which are saved in the current running directory by default.

saveCommand: Create an RDB snapshot, executed by the Redis main process, which will block all commands. RDB needs to be written to disk, and IO operations are slow.
bgsaveCommand: Start the child process to execute RDB to avoid the main process being affected.

RDB will be executed once when Redis is down.

By default, a dump.rdb file will be generated in the current directory. When Redis is started next time, this file will be loaded by default to restore Redis data.

Redis has an internal mechanism to trigger RDB, which can be found in the redis.conf file in the following format:

<span style="color:#2c3e50"><code>save 900 1      // 900 秒内，如果至少有 1 个 key 被修改，则执行 bgsave
save 300 10     // 300 秒内，如果至少有 10 个 key 被修改，则执行 bgsave
save 60 10000   // 60 秒内，如果至少有 10000 个 key 被修改，则执行 bgsave
save ""         // 表示禁用 RDB

rdbcompression yes  // 是否压缩，建议不开启，压缩也会消耗 CPU ，磁盘空间相对廉价
dbfilename dump.rdb // RDB 文件名称
dir ./              // 文件保存的路径目录
</code></span>

At the beginning of bgsave, the main process will be forked to get the child process, and the child process will share the memory data of the main process. After the fork is completed, read the memory data and write it to the RDB file.

The fork process is blocked, and Redis cannot respond to client requests at this time. The speed of fork is very fast, because fork only copies the corresponding page table, instead of copying real data, similar to the index that only copies data.

Fork uses copy-on-write technology:

When the main process performs a read operation, access to shared memory
When the main process performs a write operation, a copy of the data is copied and the write operation is performed.

extreme case

When the child process writes a new RDB file, the main process modifies a large amount of data at this time, so the data needs to be copied. When the main process needs to modify all the data, it needs twice the original memory. Therefore, when we configure the Redis service, we cannot allocate all the actual memory to Redis, and we need to reserve a part of the buffer space.

RDB persistence summary:

The process of bgsave in RDB mode:

Fork the main process to get a child process, sharing memory space
The child process reads memory data and writes new RDB files
Replace old RDB files with new RDB files

When will the RDB be executed? What does save 60 1000 mean?

The default is to execute RDB when the Redis service is stopped.
save 60 10000 means that RDB will be triggered if at least 1000 modifications are performed within 60 seconds

Disadvantages of RDBs?

The RDB execution interval is long, and there is a risk of data loss between two RDB writes
It takes time to fork subprocesses, compress, and write out RDB files

# AOF persistence

The full name of AOF is Append Only File (append file). Every write command processed by Redis will be recorded in the AOF file, which can be regarded as a command log file.

AOF is disabled by default, you need to modify the redis.conf configuration file to enable AOF

<span style="color:#2c3e50"><code>appendonly yes  // 是否开启 AOF 功能，默认是关闭的
appendfilename "appendonly.aof" // AOF 的文件名称
</code></span>

The frequency of AOF command recording can also be configured through the redis.conf file:

<span style="color:#2c3e50"><code>appendfsync always      // 表示每执行一次写命令，立刻记录到 AOF 文件中 
appendfsync everysec    // 写命令执行完先放入 AOF 缓冲区，然后每隔 1 秒将缓冲区数据写入到 AOF 文件，是默认方案
appendfsync no          // 写命令执行完先放入 AOF 缓冲区，由操作系统决定何时将缓冲区内容写回磁盘
</code></span>

configuration item	Timing of brushing	advantage	shortcoming
Always	Synchronous brush disk	High reliability, almost no data loss	performance impact
everysec	Brush per second	moderate performance	Up to 1 second of data loss
no	operating system control	best performance	Poor reliability, possible loss of large amounts of data

AOF is a record command, and the AOF file will be much larger than the RDB file. And AOF will record multiple write operations to the same key, but only the last write operation is meaningful. By executing the bgrewriteaof command, the AOF file can be rewritten to achieve the same effect with the least number of commands.

Redis will also automatically rewrite the AOF file when the threshold is triggered. Thresholds can also be configured in redis.conf:

<span style="color:#2c3e50"><code>auto-aof-rewrite-percentage 100 // AOF 文件比上次文件增长多少百分比，则触发重写
auto-aof-rewrite-min-size 64mb  // AOF 文件体积最小多大以上才触发重写
</code></span>

# Comparison between RDB and AOF

RDB and AOF each have their own advantages and disadvantages. If the data security requirements are high, they are often used in combination in actual development.

Persistence	data integrity	File size	Downtime recovery speed	Data Recovery Priority	System resource usage	scenes to be used
RDB	Take snapshots of the entire memory at regular intervals	Incomplete, lost between backups	There will be compression, the file size is small	soon	Low because data integrity is not as good as AOF	High, heavy CPU and memory consumption
AOF	Log every command executed	Relatively complete, depending on the brushing strategy	Record commands, the file size is very large	slow	High because of higher data integrity	Low, mainly disk IO resources, but AOF will take up a lot of CPU and memory resources when rewriting