Data write-back mechanism of Redis

Data write-back mechanism of Redis

The data write-back mechanism of Redis is divided into two types: synchronous and asynchronous.

  1. Synchronous write-back is the SAVE command, and the main process writes data directly to the disk. In the case of large data, it will cause the system to freeze for a long time, so it is generally not recommended.
  2. Asynchronous write-back is the BGSAVE command. After the main process forks, it copies itself and writes back to the disk through this new process. After the write-back is over, the new process closes itself. Since the main process does not need to be blocked in this way, the system will not die, and this method is generally adopted by default.

Personally, I feel that method 2 is very clumsy to fork the main process, but it seems to be the only method. Hot data in memory can be modified at any time, and the memory image must be frozen to keep it on disk for a certain period of time. Freezing will cause suspended animation. After forking a new process, it is equivalent to copying a memory image at that time, so that the main process does not need to be frozen, as long as the operation is performed on the child process.

Doing a fork on a process with small memory does not require too many resources, but when the memory space of the process is in units of G, fork becomes a terrifying operation. What's more, what about the process of fork 14G memory on the host with 16G memory? It will definitely report that the memory cannot be allocated. What's even more annoying is that the more frequently changed hosts are, the more frequent forks are, and the cost of the fork operation itself may not be much better than suspended animation.

After finding the reason, directly modify the /etc/sysctl.conf kernel parameter vm.overcommit_memory= 1

sysctl -p

The Linux kernel will decide whether to release or not according to the setting of the parameter vm.overcommit_memory parameter.

  1.  If vm.overcommit_memory = 1, let go directly
  2. vm.overcommit_memory = 0: Compare the virtual memory size allocated by this request with the current free physical memory of the system plus swap to decide whether to release.
  3. vm.overcommit_memory= 2: It will compare all the virtual memory allocated by the process plus the virtual memory allocated by this request and the current free physical memory of the system plus swap to decide whether to release.

 

Redis persistence practice and disaster recovery simulation

 

References:
Redis Persistence http://redis.io/topics/persistence
Google Groups https://groups.google.com/forum/?fromgroups=#!forum/redis-db

1. Discussion and understanding of Redis persistence

There are currently two ways to persist Redis: RDB and AOF

First of all, we should clarify what the persistent data is used for, and the answer is for data recovery after restart.
Redis is an in-memory database , whether it is RDB or AOF, it is only a measure to ensure data recovery.
Therefore, when Redis uses RDB and AOF for recovery, it will read the RDB or AOF file and reload it into memory.

RDB is Snapshot snapshot storage, which is the default persistence method.
It can be understood as a semi-persistent mode, that is, data is periodically saved to disk according to a certain strategy.
The corresponding generated data file is dump.rdb, and the snapshot period is defined by the save parameter in the configuration file.
The following are the default snapshot settings:

save 900 1 #When a Keys data is changed, refresh to Disk once every 900 seconds
save 300 10 #When 10 Keys data are changed, refresh to Disk once every 300 seconds
save 60 10000 #When 10000 Keys data are changed, refresh to Disk once every 60 seconds

The RDB file of Redis will not be corrupted because its write operation is carried out in a new process.
When a new RDB file is generated, the child process generated by Redis will first write the data to a temporary file, and then rename the temporary file to the RDB file through the atomic rename system call.
This way, at any time of failure, the Redis RDB file is always available.

同时,Redis的RDB文件也是Redis主从同步内部实现中的一环。
第一次Slave向Master同步的实现是:
Slave向Master发出同步请求,Master先dump出rdb文件,然后将rdb文件全量传输给slave,然后Master把缓存的命令转发给Slave,初次同步完成。
第二次以及以后的同步实现是:
Master将变量的快照直接实时依次发送给各个Slave。
但不管什么原因导致Slave和Master断开重连都会重复以上两个步骤的过程。
Redis的主从复制是建立在内存快照的持久化基础上的,只要有Slave就一定会有内存快照发生。

可以很明显的看到,RDB有它的不足,就是一旦数据库出现问题,那么我们的RDB文件中保存的数据并不是全新的。
从上次RDB文件生成到Redis停机这段时间的数据全部丢掉了。

AOF(Append-Only File)比RDB方式有更好的持久化性。
由于在使用AOF持久化方式时,Redis会将每一个收到的写命令都通过Write函数追加到文件中,类似于MySQL的binlog。
当Redis重启是会通过重新执行文件中保存的写命令来在内存中重建整个数据库的内容。
对应的设置参数为:
$ vim /opt/redis/etc/redis_6379.conf

appendonly yes       #启用AOF持久化方式
appendfilename appendonly.aof #AOF文件的名称,默认为appendonly.aof
# appendfsync always #每次收到写命令就立即强制写入磁盘,是最有保证的完全的持久化,但速度也是最慢的,一般不推荐使用。
appendfsync everysec #每秒钟强制写入磁盘一次,在性能和持久化方面做了很好的折中,是受推荐的方式。
# appendfsync no     #完全依赖OS的写入,一般为30秒左右一次,性能最好但是持久化最没有保证,不被推荐。

AOF的完全持久化方式同时也带来了另一个问题,持久化文件会变得越来越大。
比如我们调用INCR test命令100次,文件中就必须保存全部的100条命令,但其实99条都是多余的。
因为要恢复数据库的状态其实文件中保存一条SET test 100就够了。
为了压缩AOF的持久化文件,Redis提供了bgrewriteaof命令。
收到此命令后Redis将使用与快照类似的方式将内存中的数据以命令的方式保存到临时文件中,最后替换原来的文件,以此来实现控制AOF文件的增长。
由于是模拟快照的过程,因此在重写AOF文件时并没有读取旧的AOF文件,而是将整个内存中的数据库内容用命令的方式重写了一个新的AOF文件。
对应的设置参数为:
$ vim /opt/redis/etc/redis_6379.conf

no-appendfsync-on-rewrite yes   #在日志重写时,不进行命令追加操作,而只是将其放在缓冲区里,避免与命令的追加造成DISK IO上的冲突。
auto-aof-rewrite-percentage 100 #当前AOF文件大小是上次日志重写得到AOF文件大小的二倍时,自动启动新的日志重写过程。
auto-aof-rewrite-min-size 64mb  #当前AOF文件启动新的日志重写过程的最小值,避免刚刚启动Reids时由于文件尺寸较小导致频繁的重写。

到底选择什么呢?下面是来自官方的建议:
通常,如果你要想提供很高的数据保障性,那么建议你同时使用两种持久化方式。
如果你可以接受灾难带来的几分钟的数据丢失,那么你可以仅使用RDB。
很多用户仅使用了AOF,但是我们建议,既然RDB可以时不时的给数据做个完整的快照,并且提供更快的重启,所以最好还是也使用RDB。
因此,我们希望可以在未来(长远计划)统一AOF和RDB成一种持久化模式。

在数据恢复方面:
RDB的启动时间会更短,原因有两个:
一是RDB文件中每一条数据只有一条记录,不会像AOF日志那样可能有一条数据的多次操作记录。所以每条数据只需要写一次就行了。
另一个原因是RDB文件的存储格式和Redis数据在内存中的编码格式是一致的,不需要再进行数据编码工作,所以在CPU消耗上要远小于AOF日志的加载。

二、灾难恢复模拟
既然持久化的数据的作用是用于重启后的数据恢复,那么我们就非常有必要进行一次这样的灾难恢复模拟了。
据称如果数据要做持久化又想保证稳定性,则建议留空一半的物理内存。因为在进行快照的时候,fork出来进行dump操作的子进程会占用与父进程一样的内存,真正的copy-on-write,对性能的影响和内存的耗用都是比较大的。
目前,通常的设计思路是利用Replication机制来弥补aof、snapshot性能上的不足,达到了数据可持久化。
即Master上Snapshot和AOF都不做,来保证Master的读写性能,而Slave上则同时开启Snapshot和AOF来进行持久化,保证数据的安全性。

首先,修改Master上的如下配置:
$ sudo vim /opt/redis/etc/redis_6379.conf

#save 900 1 #禁用Snapshot
#save 300 10
#save 60 10000

appendonly no #禁用AOF

接着,修改Slave上的如下配置:
$ sudo vim /opt/redis/etc/redis_6379.conf

save 900 1 #启用Snapshot
save 300 10
save 60 10000

appendonly yes #启用AOF
appendfilename appendonly.aof #AOF文件的名称
# appendfsync always
appendfsync everysec #每秒钟强制写入磁盘一次
# appendfsync no  

no-appendfsync-on-rewrite yes   #在日志重写时,不进行命令追加操作
auto-aof-rewrite-percentage 100 #自动启动新的日志重写过程
auto-aof-rewrite-min-size 64mb  #启动新的日志重写过程的最小值

分别启动Master与Slave
$ /etc/init.d/redis start

启动完成后在Master中确认未启动Snapshot参数
redis 127.0.0.1:6379> CONFIG GET save
1) "save"
2) ""

然后通过以下脚本在Master中生成25万条数据:
dongguo@redis:/opt/redis/data/6379$ cat redis-cli-generate.temp.sh

#!/bin/bash

REDISCLI="redis-cli -a slavepass -n 1 SET"
ID=1

while(($ID<50001))
do
  INSTANCE_NAME="i-2-$ID-VM"
  UUID=`cat /proc/sys/kernel/random/uuid`
  PRIVATE_IP_ADDRESS=10.`echo "$RANDOM % 255 + 1" | bc`.`echo "$RANDOM % 255 + 1" | bc`.`echo "$RANDOM % 255 + 1" | bc`\
  CREATED=`date "+%Y-%m-%d %H:%M:%S"`

  $REDISCLI vm_instance:$ID:instance_name "$INSTANCE_NAME"
  $REDISCLI vm_instance:$ID:uuid "$UUID"
  $REDISCLI vm_instance:$ID:private_ip_address "$PRIVATE_IP_ADDRESS"
  $REDISCLI vm_instance:$ID:created "$CREATED"

  $REDISCLI vm_instance:$INSTANCE_NAME:id "$ID"

  ID=$(($ID+1))
done

dongguo@redis:/opt/redis/data/6379$ ./redis-cli-generate.temp.sh

在数据的生成过程中,可以很清楚的看到Master上仅在第一次做Slave同步时创建了dump.rdb文件,之后就通过增量传输命令的方式给Slave了。
dump.rdb文件没有再增大。
dongguo@redis:/opt/redis/data/6379$ ls -lh
total 4.0K
-rw-r--r-- 1 root root 10 Sep 27 00:40 dump.rdb

而Slave上则可以看到dump.rdb文件和AOF文件在不断的增大,并且AOF文件的增长速度明显大于dump.rdb文件。
dongguo@redis-slave:/opt/redis/data/6379$ ls -lh
total 24M
-rw-r--r-- 1 root root 15M Sep 27 12:06 appendonly.aof
-rw-r--r-- 1 root root 9.2M Sep 27 12:06 dump.rdb

等待数据插入完成以后,首先确认当前的数据量。
redis 127.0.0.1:6379> info

redis_version:2.4.17
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:27623
run_id:e00757f7b2d6885fa9811540df9dfed39430b642
uptime_in_seconds:1541
uptime_in_days:0
lru_clock:650187
used_cpu_sys:69.28
used_cpu_user:7.67
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
connected_clients:1
connected_slaves:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:33055824
used_memory_human:31.52M
used_memory_rss:34717696
used_memory_peak:33055800
used_memory_peak_human:31.52M
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:0
changes_since_last_save:250000
bgsave_in_progress:0
last_save_time:1348677645
bgrewriteaof_in_progress:0
total_connections_received:250007
total_commands_processed:750019
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:246
vm_enabled:0
role:master
slave0:10.6.1.144,6379,online
db1:keys=250000,expires=0

当前的数据量为25万条key,占用内存31.52M。

然后我们直接Kill掉Master的Redis进程,模拟灾难。
dongguo@redis:/opt/redis/data/6379$ sudo killall -9 redis-server

我们到Slave中查看状态:
redis 127.0.0.1:6379> info

redis_version:2.4.17
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:13003
run_id:9b8b398fc63a26d160bf58df90cf437acce1d364
uptime_in_seconds:1627
uptime_in_days:0
lru_clock:654181
used_cpu_sys:29.69
used_cpu_user:1.21
used_cpu_sys_children:1.70
used_cpu_user_children:1.23
connected_clients:1
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:33047696
used_memory_human:31.52M
used_memory_rss:34775040
used_memory_peak:33064400
used_memory_peak_human:31.53M
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:1
changes_since_last_save:3308
bgsave_in_progress:0
last_save_time:1348718951
bgrewriteaof_in_progress:0
total_connections_received:4
total_commands_processed:250308
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:694
vm_enabled:0
role:slave
aof_current_size:17908619
aof_base_size:16787337
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
master_host:10.6.1.143
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
master_link_down_since_seconds:25
slave_priority:100
db1:keys=250000,expires=0

可以看到master_link_status的状态已经是down了,Master已经不可访问了。
而此时,Slave依然运行良好,并且保留有AOF与RDB文件。

下面我们将通过Slave上保存好的AOF与RDB文件来恢复Master上的数据。

首先,将Slave上的同步状态取消,避免主库在未完成数据恢复前就重启,进而直接覆盖掉从库上的数据,导致所有的数据丢失。
redis 127.0.0.1:6379> SLAVEOF NO ONE
OK

确认一下已经没有了master相关的配置信息:
redis 127.0.0.1:6379> INFO

redis_version:2.4.17
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:13003
run_id:9b8b398fc63a26d160bf58df90cf437acce1d364
uptime_in_seconds:1961
uptime_in_days:0
lru_clock:654215
used_cpu_sys:29.98
used_cpu_user:1.22
used_cpu_sys_children:1.76
used_cpu_user_children:1.42
connected_clients:1
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:33047696
used_memory_human:31.52M
used_memory_rss:34779136
used_memory_peak:33064400
used_memory_peak_human:31.53M
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:1
changes_since_last_save:0
bgsave_in_progress:0
last_save_time:1348719252
bgrewriteaof_in_progress:0
total_connections_received:4
total_commands_processed:250311
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1119
vm_enabled:0
role:master
aof_current_size:17908619
aof_base_size:16787337
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
db1:keys=250000,expires=0

在Slave上复制数据文件:
dongguo@redis-slave:/opt/redis/data/6379$ tar cvf /home/dongguo/data.tar *
appendonly.aof
dump.rdb

将data.tar上传到Master上,尝试恢复数据:
可以看到Master目录下有一个初始化Slave的数据文件,很小,将其删除。
dongguo@redis:/opt/redis/data/6379$ ls -l
total 4
-rw-r--r-- 1 root root 10 Sep 27 00:40 dump.rdb
dongguo@redis:/opt/redis/data/6379$ sudo rm -f dump.rdb

然后解压缩数据文件:
dongguo@redis:/opt/redis/data/6379$ sudo tar xf /home/dongguo/data.tar
dongguo@redis:/opt/redis/data/6379$ ls -lh
total 29M
-rw-r--r-- 1 root root 18M Sep 27 01:22 appendonly.aof
-rw-r--r-- 1 root root 12M Sep 27 01:22 dump.rdb

启动Master上的Redis;
dongguo@redis:/opt/redis/data/6379$ sudo /etc/init.d/redis start
Starting Redis server...

查看数据是否恢复:
redis 127.0.0.1:6379> INFO

redis_version:2.4.17
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:16959
run_id:6e5ba6c053583414e75353b283597ea404494926
uptime_in_seconds:22
uptime_in_days:0
lru_clock:650292
used_cpu_sys:0.18
used_cpu_user:0.20
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
connected_clients:1
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:33047216
used_memory_human:31.52M
used_memory_rss:34623488
used_memory_peak:33047192
used_memory_peak_human:31.52M
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:0
changes_since_last_save:0
bgsave_in_progress:0
last_save_time:1348680180
bgrewriteaof_in_progress:0
total_connections_received:1
total_commands_processed:1
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
vm_enabled:0
role:master
db1:keys=250000,expires=0

可以看到25万条数据已经完整恢复到了Master上。

此时,可以放心的恢复Slave的同步设置了。
redis 127.0.0.1:6379> SLAVEOF 10.6.1.143 6379
OK

查看同步状态:
redis 127.0.0.1:6379> INFO

redis_version:2.4.17
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.5
process_id:13003
run_id:9b8b398fc63a26d160bf58df90cf437acce1d364
uptime_in_seconds:2652
uptime_in_days:0
lru_clock:654284
used_cpu_sys:30.01
used_cpu_user:2.12
used_cpu_sys_children:1.76
used_cpu_user_children:1.42
connected_clients:2
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:33056288
used_memory_human:31.52M
used_memory_rss:34766848
used_memory_peak:33064400
used_memory_peak_human:31.53M
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:1
changes_since_last_save:0
bgsave_in_progress:0
last_save_time:1348719252
bgrewriteaof_in_progress:1
total_connections_received:6
total_commands_processed:250313
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:12217
vm_enabled:0
role:slave
aof_current_size:17908619
aof_base_size:16787337
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
master_host:10.6.1.143
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_priority:100
db1:keys=250000,expires=0

master_link_status显示为up,同步状态正常。

In the process of this recovery, we copied the AOF and RDB files at the same time, so which file has completed the data recovery?
In fact, when the Redis server hangs, the data will be restored to the memory according to the following priorities during restart:
1. If only AOF is configured, the AOF file will be loaded to restore the data when restarting;
2. If both RDB and AOF are configured, the startup will only be performed Load the AOF file to restore the data;
3. If only RDB is configured, the dump file will be loaded to restore the data at startup.

That is to say, AOF has higher priority than RDB, which is also understandable, because AOF itself guarantees data integrity higher than RDB.

In this case, we ensured the data by enabling AOF and RDB on the Slave and restored the Master.

However, in our current online environment, since the data has an expiration time, the AOF method will not be practical. Too frequent write operations will make the AOF file grow to an abnormally large size, which greatly exceeds our actual data volume. , which also results in a huge amount of time during data recovery.
Therefore, you can only enable Snapshot on the Slave for localization. At the same time, you can consider increasing the frequency in save or calling a scheduled task to perform periodic bgsave snapshot storage to ensure the integrity of localized data as much as possible.
Under such an architecture , if only the Master hangs up, the Slave is complete and the data recovery can reach 100%.
If the Master and Slave hang at the same time, the data recovery can also reach an acceptable level.

Reprinted from http://blog.csdn.net/gzh0222/article/details/8482525

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326349604&siteId=291194637