redis (7), redis persistence

 
Redis persistence, as the name implies, is to save the data in memory to the hard disk to prevent data loss due to accidents in redis.
There are currently two schemes, the RDB method and the AOF method. The former will periodically persist the data in memory to the hard disk according to the configured rules, and the latter will record the command after each execution of the write command. The two persistence methods can be used independently, but they are often used in combination. According to the idea of ​​the redis author, these two schemes will eventually be combined into one in future versions.

1. Snapshot RDB
(1 Introduction
The RDB file persisted by redis is a compressed binary file, which saves the key-value pair data in memory and stores it on the hard disk to prevent data loss when the redis database fails.
When the redis database fails, you can use the RDB file to restore the original database state.
In practice, rules must be set up to regularly back up the redis server data and save it on other remote servers. Once there is a problem with the redis database and you want to restore it to the original point in time, you can use the backup RDB file to restore it.
If there is a problem with the RDB file, you can also use the tool redis-check-dump that comes with the redis database to detect it.
 
(2), how to use
There are two commands to generate RDB files, SAVE, BGSAVE.
a) The SAVE command will block the server process. It creates an RDB file in the main process and blocks other client requests.
b) The BGSAVE command will create an RDB file in a subprocess fork from the main process, and will not block client requests.
c) The above two instructions save the RDB file when the redis-cli client directly executes the command. You can also set the SAVE command configuration, and redis automatically saves the RDB file.
save 60 100 #If there are 100 modifications within 60 seconds, redis will automatically save the RDB file
save 300 10 #If there are 10 modifications within 300 seconds, redis will automatically save the RDB file
....
Multiple commands can be set, and the RDB file will be automatically saved as long as a save command condition is triggered.
The SAVE command set here, redis actually calls the BGSAVE command to create an RDB file in a subprocess to ensure that the main redis process is not blocked and can continue to process client read and write requests.
 
In actual use, it should be set according to the scene, but it must be set.
a) If the redis database itself is read more than written, the set storage time is longer. You might as well set it to one hour to trigger the creation of RDB.
b) If the redis database is written a lot and the data is sensitive, you can set the time to be shorter and save it once every 5 minutes or 2 minutes.
c) The data itself is relatively sensitive, and master-slave backup is required, and the principle of master-slave backup is that when the master redis database saves the RDB, the synchronization of the slave redis database will be triggered. At this time, the setting time of the response is also shorter.
 
  (3) Principle
 When configuring the save command only, the redis database automatically triggers the creation of an RDB file, and the internal principle is the same when manually executing the save and bgsave commands in redis-cli.
1. The client initiates a write request
2. Redis will record the write command counter and save a time when the RDB was last saved
3. When redis cycles periodically, trigger a set SAVE command, redis will read the write command counter, and finally save the time
4. When the conditions for saving the RDB file are met, redis will fork a child process, and in fact, start to execute the BGSAVE command process
5. Scan all data in the redis database and save it to a random RDB file
6. Modify the old RDB file name
7. Name the new random RDB file as a normal RDB file, namely dump.rdb, and delete the original old RDB file.
 As shown below:
 
Note: When the fork is executed, the operating system (Unix-like operating system) will use the copy-on-write strategy, that is, when the fork function occurs, the parent process and the child process share the same piece of memory data, when the parent process When a piece of data needs to be modified (such as executing a write command), the operating system will copy the piece of data to ensure that the child process is not affected, so the RDB file stores the memory data at the moment of the fork operation. Therefore, in theory, the RDB method will lose data (those modified after fork are not written into the RDB file).
 
(4), advantages
a), RDB is a compact file that represents Redis data at a certain point in time. RDB files are suitable for backup. For example, you might want to archive the last 24 hours of RDB files every hour and save nearly 30 days of RDB snapshots every day. This allows you to easily restore different versions of the dataset for disaster recovery.
b), RDB is very suitable for disaster recovery, as a compact single file, can be transferred to a remote data center.
c), RDB maximizes the performance of Redis, because the only thing that needs to be done when the Redis parent process is persistent is to start (fork) a child process, and the child process will complete all the remaining work. The parent process instance does not need to perform operations like disk IO.
d), RDB is faster than AOF when restarting instances that save large datasets.
  (5) Disadvantages
 a), when you need to minimize data loss when Redis stops working (e.g. power outage), RDB may not be so good. You can configure different save points to save RDB files (for example, at least 5 minutes and after 100 writes to the dataset, but you can have multiple save points). However, you usually create an RDB snapshot every 5 minutes or more, so once Redis stops working for any reason without shutting down properly, you have to be prepared for data loss in the last few minutes. 

b), RDB needs to call fork() subprocess frequently to persist to disk. If the dataset is large, fork() is time-consuming, and as a result, when the dataset is very large and the CPU performance is not powerful enough, Redis will stop serving clients for a few milliseconds or even a second. AOF also requires fork(), but you can adjust how often to rewrite the log without trade-off durability. 

 
 
Second, append the log file AOF
(1 Introduction
In addition to providing RDB persistence, redis also provides AOF (append only file) persistence. Unlike RDB persistence, which saves the key-value pairs of the redis database, AOF persistence records the database state by saving the write commands executed by the redis server.
When AO is enabled
When the F persistence function is used, the server will restore data from the AOF file first; if AOF is not enabled, it will restore data from the RDB
data. If there is an error in the AOF file, Redis comes with the redis-check-aof tool to repair the original file.

  (2), how to use

a), first enable AOF in the configuration file

appendonly yes
b), configure AOF strategy, there are three strategies
appendfsync no
When appendfsync is set to no, Redis will not actively call fsync to synchronize the AOF log content to disk, so it all depends on the debugging of the operating system. For most Linux operating systems, an fsync occurs every 30 seconds to write the data in the buffer to disk.
appendfsync everysec
When appendfsync is set to everysec, Redis will make an fsync call every second by default to write the data in the buffer to disk. But when this time the fsync call is longer than 1 second. Redis will adopt a strategy of delaying fsync and wait another second. That is, fsync is performed after two seconds, and this time the fsync will be performed no matter how long it takes to execute. At this time, since the file descriptor will be blocked during fsync, the current write operation will be blocked.
So, the conclusion is that in the vast majority of cases, Redis will fsync every second. In the worst case, an fsync operation occurs every two seconds.
This operation is called group commit in most database systems, which combines the data of multiple write operations and writes the log to disk at one time.
appednfsync always
When appendfsync is set to always, fsync will be called once for each write operation. At this time, the data is the safest. Of course, since fsync is executed every time, its performance will also be affected.
 
 
(3) Principle
1. The client makes a write request
2. The redis server receives the write request and puts it into the AOF buffer of the redis server memory
3. In the periodic cycle of redis, trigger the write log strategy, and go to the AOF write command buffer to read data
4. If it is appliednfsync always, the log will be rewritten in the main process, which will block other requests. If it is appendfsync everysec, it will fork a child process to rewrite the log. If it is appendfsync no, it depends on the operating system to write logs. Most Linux operating systems default to every 30 seconds.
5. The server calls the write(2) system call to write the data to the system buffer. If the redis database fails when the AOF process is saved to this step, the log will still be saved correctly. The following process is completed by the operating system.
6. The operating system transfers the data in the buffer to the disk controller
7. The disk controller writes the data to the physical medium of the disk (the data actually falls on the disk). Only when this step is completed and the machine fails, such as a power outage, can the logs be properly saved.
As shown below:
 
 
 But there is a problem here. When there are more and more write commands, the AOF file will become larger and larger, so Redis provides another function called AOF rewrite. Its function is to regenerate an AOF file. The operation of a record in the new AOF file can only be performed once, unlike an old file, which may record multiple operations on the same value. The generation process is similar to RDB. It is also a fork process, which directly traverses the data and writes a new AOF temporary file. In the process of writing a new file, all write operation logs will still be written to the original old AOF file, and will also be recorded in the memory buffer. When the redo operation is completed, the logs in all buffers will be written to the temporary file at one time. Then call the atomic rename command to replace the old AOF file with the new AOF file.

From the above process, we can see that both RDB and AOF operations are sequential IO operations with high performance. At the same time, when database recovery is performed through RDB files or AOF logs, the data is also sequentially read and loaded into memory. Therefore, it will not cause random reads of the disk.

 
(4), advantages
1. Using AOF Redis will be more durable: you can have many different fsync strategies: no fsync, fsync per second, fsync per request. With the default fsync per second policy, write performance is still good (fsync is done by a background thread, and the main thread continues to work hard on write requests), even if you only lose one second of write data.
2. The AOF log is an additional file, so there is no need to locate it, and there is no problem of damage when the power is turned off. Even if for some reason the end of the file is a half-written command (disk full or whatever), the redis-check-aof tool can easily fix it.
When the AOF file becomes very large, Redis will automatically rewrite it in the background. Rewriting is absolutely safe because Redis continues appending to the old file, creating a brand new file with the minimal set of operations required to create the current dataset, and once the second file is created, Redis switches the two file and start appending to the new file.
3. The AOF file contains one operation after another, stored in a format that is easy to understand and parse. You can also easily export an AOF file. For example, even if you accidentally flush everything with the FLUSHALL command by mistake, you can still save your dataset if you don't perform a rewrite at this point, you just stop the server, delete the last command, and restart Redis.
 
(5) Disadvantages
 1. For the same dataset, the AOF file is usually larger than the equivalent RDB file. AOF may be slower than RDB, depending on the exact fsync strategy. Usually fsync is set to once per second and the performance is still high, if fsync is turned off, it is as fast as RDB even under high load. However, even under heavy write load, RDB still provides good maximum latency guarantees.
 
3. Summary

Generally speaking, we should use both persistence methods at the same time. In the actual configuration, it is best to combine the two, AOF ensures that data will not be lost, RDB performs backup data and provides low-latency master-slave replication.
If a few minutes of data loss in the event of a disaster is acceptable, you can just use RDB alone. 
It is not good to use AOF alone, because it is very convenient for database backup to take RDB snapshots from time to time, the startup speed is also faster, and the bug of the AOF engine is avoided. 
For these reasons, redis may unify AOF and RDB into a single persistence model (long-term plan). 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326225236&siteId=291194637