13. Deploy redis enterprise-level data backup solutions and various data recovery and disaster recovery drills in the project


So far, in fact, it is still at the level of simply learning knowledge, and I have learned the principle and operation of redis persistence, but in the enterprise, how is persistence used?

How to do enterprise-level data backup and data recovery under various disasters?

1. Enterprise-level persistent configuration strategy

In the enterprise, the generation strategy of RDB is similar to the default one.

save 60 10000: If you want to ensure as much as possible that RDB loses at most 1 minute of data, then try to generate a snapshot every 1 minute. During low peak periods, the amount of data is small and unnecessary.

10000->generate RDB, 1000->RDB, this is determined by your own application and business data volume

AOF must be turned on, fsync, everysec

auto-aof-rewrite-percentage 100: It means that the current AOF size expands to 100% more than the last time, twice the last time
auto-aof-rewrite-min-size 64mb: 16mb, 32mb according to your data volume

2. Enterprise-level data backup solution

RDB is very suitable for cold backup. After each generation, it will not be modified again.

data backup solution

(1) Write a crontab schedule script to do data backup
(2) Copy a backup of rdb every hour, go to a directory, and keep only the backup of the last 48 hours
(3) Keep a copy of the rdb of the day every day Backup, go to a directory, keep only the backups of the last month
(4) When copying backups, delete the old ones
(5) Backup all the data on the current server every night and send A copy to a remote cloud service

/usr/local/redis

Copy backups every hour and delete data older than 48 hours

crontab -e

0 * * * * sh /usr/local/redis/copy/redis_rdb_copy_hourly.sh

redis_rdb_copy_hourly.sh

#!/bin/sh

cur_date=`date +%Y%m%d%k`
rm -rf /usr/local/redis/snapshotting/$cur_date
mkdir /usr/local/redis/snapshotting/$cur_date
cp /var/redis/6379/dump.rdb /usr/local/redis/snapshotting/$cur_date

del_date=`date -d -48hour +%Y%m%d%k`
rm -rf /usr/local/redis/snapshotting/$del_date

Copy a backup every day

crontab -e

0 0 * * * sh /usr/local/redis/copy/redis_rdb_copy_daily.sh

redis_rdb_copy_daily.sh

#!/bin/sh

cur_date=`date +%Y%m%d`
rm -rf /usr/local/redis/snapshotting/$cur_date
mkdir /usr/local/redis/snapshotting/$cur_date
cp /var/redis/6379/dump.rdb /usr/local/redis/snapshotting/$cur_date

del_date=`date -d -1month +%Y%m%d`
rm -rf /usr/local/redis/snapshotting/$del_date

Upload all data to a remote cloud server once a day

3. Data recovery plan

(1) If the redis process hangs, then restart the redis process and restore data directly based on the AOF log file

No demonstration, in the AOF data recovery part, it is demonstrated, fsync everysec, at most one second is lost

(2) If the machine where the redis process is located hangs, after restarting the machine, try to restart the redis process, and try to perform data recovery directly based on the AOF log file

AOF is not damaged, and it can be restored directly based on AOF

AOF append-only, write sequentially, if the AOF file is damaged, use redis-check-aof fix

(3) If the current latest AOF and RDB files of redis are lost/corrupted, you can try to perform data recovery based on a current copy of the latest RDB data on the machine

At present, the latest AOF and RDB files are lost/corrupted beyond recovery. Generally, it is not the fault of the machine.

Big data system, hadoop, someone accidentally put the directory corresponding to a large number of data files stored in hadoop, rm -rf, a small company of my friend, the operation and maintenance is not very reliable, and the permissions are not very good

The files under /var/redis/6379 are deleted

Find the latest backup of RDB, hour-level backup is fine, hour-level backup must be the latest, copy it to redis, you can restore data to a certain hour

Disaster recovery drill

Let me explain to you, I actually teach, why a large number of lecturers may give pure PPT, or various copy and paste, not on-site explanations and code demonstrations

It's easy to make mistakes, and to avoid mistakes, it's usually played like that

Tucao, read PPT, the effect is very poor

Real, lesson preparation, lectures are inevitable, there will be some problems, but I think it's okay, real

appendonly.aof + dump.rdb, first use appendonly.aof to restore data, but we found that appendonly.aof automatically generated by redis has no data

Then our own dump.rdb has data, but our data is obviously useless

When redis starts, it automatically re-based the memory data, generates a latest rdb snapshot, directly uses the empty data, overwrites the data we have, and copies the past dump.rdb

After you stop redis, you should delete appendonly.aof first, then copy our dump.rdb, and then restart redis

It's very simple, that is, although you delete appendonly.aof, but because aof persistence is turned on, redis will give priority to recovery based on aof, even if the file is not there, then create a new empty aof file

Stop redis, temporarily close aof in the configuration, then copy a copy of rdb, and then restart redis, can the data be recovered?

When the brain is hot, turn off redis, manually modify the configuration file, open aof, and restart redis, the data is gone, the empty aof file, all the data is gone

In the case of data security loss, based on rdb cold backup, how to restore data perfectly, while maintaining the dual open of aof and rdb

Stop redis, close aof, copy rdb backup, restart redis, confirm data recovery, directly modify the redis configuration on the command line, open aof, this redis will write the log corresponding to the data in memory into the aof file

At this point, the data of the aof and rdb data files are synchronized.

Redis config set hot-modifies configuration parameters, maybe the actual parameters in the configuration file have not been modified persistently, stop redis again, manually modify the configuration file, open the aof command, and restart redis again

(4) If all RDB files on the current machine are damaged, pull the latest RDB snapshot from the remote cloud service and restore the data

(5) If it is found that there is a major data error, such as a program that goes online in a certain hour, all the data is polluted at once, and the data is all wrong, then you can choose an earlier time point to restore the data

For example, the code was launched at 12:00, and it was found that the code had bugs, which caused all the cached data generated by the code to be written to redis, all wrong.

Find a cold backup of rdb at 11 o'clock, and then follow the above steps to restore the data to 11 o'clock, can't you?

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324608728&siteId=291194637