In-depth good article: The nanny-level tutorial thoroughly understands Redis persistence

Get into the habit of writing together! This is the second day of my participation in the "Nuggets Daily New Plan · April Update Challenge",Click to view event details

Why do you need persistence?

Redis is an in-memory database. If the in-memory database state is not saved to disk, once the server process exits, the database state in the server will also be lost. Data loss is a serious production and failure, so it is necessary to redistribute Redis data. Persist. Redis provides the following persistence methods at different levels

  • RDB snapshot persistence can generate point-in-time snapshots of in-memory datasets at specified intervals
  • AOF persistently records all write commands executed by the server, and restores data through command replay when the service restarts
  • Hybrid persistence, taking into account the characteristics of RDB and AOF

RDB snapshot

By default, Redis saves in-memory database snapshots in dump.rdba .

By modifying the redis.confconfiguration automatically save the data set once when the condition that there are at least M changes in the data set within N seconds is satisfied.

# 以下设置会让 redis 在满足60秒内至少有1000个数据被改动,这一条件被满足时,自动保存一次数据集
save 60 1000
复制代码

Multiple rules can be set, and satisfying any rule will trigger the save mechanism.

You can also manually execute commands to generate RDB snapshots, use the client to connect to the redis server, execute commands saveor bgsaveproduce dump.rdbfiles . Every time the command is executed, all redis memory data will be snapshotted to a new rdb file, and the original rdb snapshot file will be overwritten.

Closing RDB snapshots requires commenting out all save policies and setting an empty policysave ""

Snapshot execution process

  1. The fork()call has both nearby and child processes.
  2. The child process writes the dataset to a temporary RDB file.
  3. When the child process finishes writing to the new RDB file, Redis replaces the original RDB file with the new RDB file and deletes the old RDB file.

image-20220322170706175

COW (copy-on-write mechanism) of bgsave

Redis 借助操作系统提供的写时复制技术(Copy-On-Write COW),在生成快照的同时,依然可以正常处理写命令。

简单来说,bgsave 是由主线程 fork 生成的,可以贡献主线程的所有内存数据。bgsave 子进程运行后,开始读取主线程的内存数据,并把它们写入 RDB 文件。

此时,如果主线程对这些数据都是读操作,那么主线程和 bgsave 子进程相互不影响。

但是如果主线程要修改一块数据,那么这块数据就会被复制一份,生成该数据的副本。然后 bgsave 子进程会把这个副本数据写入 RDB 文件,在这个过程,主线程仍然可以直接修改原来的数据。

save 与 bgsave 的比较

命令 save bgsave
IO类型 同步 异步
是否阻塞其他命令 否(在生成子进程执行调用fork函数时会有短暂阻塞)
复杂度 O(n) O(n)
优点 不会消耗额外的内存 不阻塞客户端
取点 阻塞客户端命令 需要fork子进程,消耗内存

AOF (append-only file)

快照功能并不是非常耐久的(durable):如果 redis 因为某些原因而造成故障停机,那么服务器将丢失最近写入、且仍未保存到快照中的那些数据。

1.1 版本开始,Redis 增加了一种完全耐久的持久化方式:AOF 持久化,将修改的每一条指令记录进文件 appendonly.aof 中(先写入到 OS cache ,每隔一段时间 fsync 到磁盘)。

可以通过修改配置文件来打开 AOF 功能:

appendonly yes
复制代码

从现在开始,每当 Redis 执行一个改变数据的命令时,这个命令就会被追加到 AOF 文件的末尾。这样的话,当 Redis 重新启动时,程序就可以通过执行 AOF 文件内的命令来达到重建数据的目的。

比如执行命令 set beifeng 666 AOF 文件会记录如下数据:

*3
$3
set
$7
beifeng
$3
666
复制代码

这是一种 resp 协议格式数据,星号后面的数字代表命令有多少个参数,$ 后面的数字代表这个参数有几个字符。

如果执行带过期时间的 set 命令,AOF 文件里记录的并不是执行的原始命令,而是记录 key 过期的时间戳。

# 执行命令
set aixuexi 666 ex 60
# AOF 文件内容
*5
$3
SET
$7
aixuexi
$3
666
$4
PXAT
$13
1647941694816
复制代码

有三个选项可以配置 Redis 多久才将数据 fsync 到磁盘。

appendfsync always # 每次有新命令追加到 AOF 文件时就执行一次 fsync,非常慢、也非常安全
appendfsync everysec # 默认值,每秒 fsync 一次,足够快,并且在故障时只会丢失 1s 的数据
appendfsync no # 从不 fsync 将数据交给操作系统来处理。更快也更不安全
复制代码

AOF 重写

因为 AOF 的运作方式是不断的将命令追加到文件的末尾,所以随着写入命令的不断增加,AOF文件的体积也越来越大,AOF 文件里可能有太多没用的指令,所以 AOF 会定期根据内存最新数据生成 AOF 文件。

127.0.0.1:6379> INCR count # 执行14次 
127.0.0.1:6379> BGREWRITEAOF
Background append only file rewriting started
复制代码

重写后的 AOF 文件

*3
$3
SET
$5
count
$2
14
复制代码

如下两个参数可以控制 AOF 重写频率

auto-aof-rewrite-percentage 100 # aof 文件自上次重写后文件大小增长了100% 再次触发重写
auto-aof-rewrite-min-size 64mb # aof 至少要达到64m才会自动重写,文件太小恢复速度本来就很快,重写的意义不大
复制代码

AOF 重写 redis 会 fork 出一个子进程去做,不会对 redis 正常命令处理有太多影响。

AOF 执行流程

AOF 和 RDB 创建快照一样,也巧妙的利用了写时复制机制。

  1. Redis 执行 fork() ,现在同时拥有父进程和子进程
  2. 子进程开始将新 AOF 文件的内容写入到临时文件
  3. 对于所有新执行的写入命令,父进程一边将他们累积到一个内存缓冲中,一边将这些改动追加到现有 AOF 文件的末尾;这样即使在重写的中途发生停机,现有的 AOF 文件也还是安全的
  4. 当子进程完成重写工作时,他给父进程发送一个信号,父进程在接收到信号之后,将内存缓存中的所有数据追加到新 AOF 文件的末尾
  5. 搞定!现在 Redis 原子的用新文件替换旧文件,之后所有命令都会直接追加到新 AOF 文件的末尾

image-20220323100251458

RDB 与 AOF 对比

持久方式 RDB AOF
启动优先级
体积
恢复速度
数据安全性 容易丢失数据 根据策略决定

生产环境都可以使用,redis 启动时如果既有 rdb 文件又有 aof 文件,则优先选择 aof 文件恢复数据,因为相对来说 AOF 更安全一点。

Redis 混合持久化

We seldom use RDB to restore in-memory data when restarting Redis because a lot of data is lost. AOF log replay is usually used, but the performance of replaying AOF log is much slower than that of RDB, so in the case of a large Redis instance, it takes a long time to start. To solve this problem, Redis 4.0 brings a new persistence option Hybrid Persistence .

Enable hybrid persistence through the following configuration (it is enabled by default).

aof-use-rdb-preamble yes # 必须先开启AOF; appendonly yes
复制代码

After hybrid persistence is turned on, AOF no longer simply converts memory data into RESP commands and writes AOF files when rewriting, but performs RDB snapshot processing on the memory before the moment of rewriting , and converts the RDB snapshot content and increment. The AOF commands that modify memory data exist together and are written to a temporary AOF file. Replace the original AOF file after the rewriting is completed.

In this way, when Redis restarts, the RDB content can be loaded first, and then the AOF log can be replayed incrementally, which greatly improves the efficiency.

Hybrid Persistence AOF File Structure

image-20220323103453282

Backup Redis data

  1. Write a crontab scheduled task script, copy a backup of the rdb or aof file to a directory every hour, and only keep the backup for 48 hours
  2. Keep a backup of the data of the day to a directory every day, you can keep the backup of the last month
  3. Every time you copy the latest data, delete the data that is too old
  4. Copy the data of the current machine to other machines or OSS storage every night to prevent machine damage

Guess you like

Origin juejin.im/post/7084626876412461064