Redis persistence mechanism provided

Redis is oriented "key-value" type distributed database system NoSQL data, with high performance, persistent storage, adaptation advantage of high concurrent application scenarios like. Although it started late, but the development is very rapid. 

Recently, Redis authors wrote in the blog, he saw all Redis for discussion, the persistent misunderstanding of Redis is the biggest , so he wrote a long article on Redis persistence to carry out a systematic discussion.

This paper mainly includes three aspects: Redis persistence is how it works, the performance and reliability as well as other types of database comparison . The following is the content of the article: 

a, Redis persistence How does it work?  

  What is persistence? Simply put, that is, after the data into the device off data is not lost, that is our usual understanding of the hard disk.

First we look at the database writes are carried out in the end what had been done thing, there are the following five processes

 

  • The client sends to the server writes (data in memory client).
  • Database server receives data (data in memory in the server side) write requests.
  • The server calls the write system call, the data is written to disk (data in system memory buffer).
  • Operating system to transfer data in the buffer to the disk controller (data in the disk cache).
  • Disk controller writes data to the physical disk media (data is actually falls on disk).

 

Failure Analysis 

write roughly the top 5 processes, let's combine the above five processes look at the various levels of failure

 

  • When the database system failure, this time the system kernel is still intact. So this time as long as we finished the implementation of step 3, then the data is safe, because the operating system will follow to complete a few steps behind, to ensure that the data on the disk will eventually fall.
  • When the system is powered down, this time in the above-mentioned five of all caches will be disabled, and the database and operating system will stop working. Therefore, only when the data after the completion of step 5, in order to ensure data is not lost after a power outage .

 

By understanding the 5 steps above, we may want to find out some of the following questions

  • How long database called once write, write data to the kernel buffer?
  • How long will the system kernel data buffer is written to the disk controller?
  • Disk controller and at what time the data is written to the cache on physical media?

 

  For the first question, the database level will usually full control.

  And the second question, the operating system has its default policy, but we can also provide a series by POSIX API fsync command forces the operating system to write data from the core area on the disk controller.

  For the third question, as if the database has been unable to reach, but in fact, in most cases the disk cache is set off, or open only cache is a read cache, that does not write directly to disk .

  The recommended practice is only turned on when you write cache when the disk devices have battery backup

Data corruption 

  so-called data corruption, data can not be recovered is, above all we are talking about how to ensure that the data is actually written to disk up, but that does not mean the data may be written on the disk will not be damaged. For example, we might write a request will be two different write operation, when an accident may result in a write operation to complete safety, but another has not been carried out. If the data file structure organization of the database unreasonable, might result in data completely unrecoverable situation occurs. 

Here there are usually three strategies to organize data, to prevent damage to data files can not be restored in case

    • The first processing is the roughest, is no guarantee recoverability of data in the form of organizational data. Instead, after the data file is damaged to recover the data backed up by the backup data synchronization arranged. Indeed MongoDB without opening operation logs by configuring Replica Sets is the case.
    • Another is to add on the basis of the above operating on a log, remember what operating behavior of each operation, so that we can recover data by operating log. Because the operation log is appended, in order to write the way, so the situation can not be restored operation log does not appear. It also opened a case like MongoDB operation log.
    • Safer approach is to modify the database without the old data, but with an additional way to complete a write operation, so that the data itself is a log, so that the data can not be recovered case never appeared. In fact CouchDB is an excellent example of this approach.

 

Two, Redis provides RDB persistence and persistence AOF

Advantages and apply a mechanism of RDB

  RDB persistence refers to the time interval specified in data set snapshot memory is written to disk.

  Is the default persistence, this approach is that the memory data as a snapshot written to the binary file, the default file name is dump.rdb.

You can automatically make a snapshot of persistent manner by the configuration settings. If we can configure redis more than m key has been modified to automatically make a snapshot in n seconds, the following is a snapshot of the default save the configuration

   save 900 1 #900秒内如果超过1个key被修改,则发起快照保存 save 300 10 #300秒内容如超过10个key被修改,则发起快照保存 save 60 10000 

RDB file saving process

  • redis calls fork, now with the child and the parent.
  • The parent process to continue processing client requests, the child process is responsible for the contents of the memory is written to a temporary file. Since the replication mechanism (copy on write) write os of the parent and child share the same physical page, when the parent process to handle write requests os parent process will create a copy of the page you want to modify, rather than writing pages that are shared. Therefore, data in the address space of the child process is a snapshot of the entire database fork moment.
  • When the child process to write temporary files snapshot is completed, replace the original file with the temporary snapshot file, and then the child process exits.

client can also use the save command notification or bgsave redis do a snapshot of persistence. save the snapshot operation is stored in the main thread, since redis is a main thread to process all the requests of the client, this approach will block all client requests. It is not recommended.

Another point to note is that every time a snapshot of persistent memory data are completely written to disk once, not just incremental sync dirty data. If the data is large, then, and more write operations, will inevitably cause a lot of disk io operations, could seriously affect performance.

Advantage

  • Once this manner, then your entire Redis database will contain only one file, so very convenient for backup. For example, you might not one day going to archive some data.
  • Easy backup, we can easily move a file to a RDB on other storage media
  • RDB speed when recovering large data sets faster than the speed of recovery of AOF.
  • RDB Redis can maximize performance: the parent process to do is fork a child process when you save the file RDB, then the child process will handle all subsequent preservation work, the parent process without performing any disk I / O operations .

Disadvantaged

  • If you need to try to avoid losing data when a server fails, the RDB is not for you. Although Redis allows you to set various save points (save point) to control the frequency of RDB save the file, however, because RDB files need to save the state of the entire data set, so it is not an easy operation. So you may save time for at least 5 minutes before RDB file. In this case, once the downtime occurs, you may lose a few minutes of data.
  • RDB each time you save time, Redis must fork () a child process by the child process to the actual persistent work. When comparing large datasets, fork () can be very time consuming, cause the server to stop processing the client within a certain milliseconds; if the data set is very large, and the CPU time is very tight, then this might even stop time president up to a full second. Although AOF rewrite also need to fork (), but no matter how AOF rewrite execution interval length, durability will not be any loss of data.

AOF file saving process

redis will each write command received by the write function are added to the file (the default is appendonly.aof).

When redis restart saved by re-executing file write command to rebuild your entire database in memory. Of course, because the os will modify write cache to do in the kernel, it may not be immediately written to disk. Persistence such aof way also still possible to lose some modifications. But we can tell by redis profile we want to force os written to disk through fsync timing function. There are three ways as follows (default: fsync once per second)

appendonly yes              //启用aof持久化方式
# appendfsync always      //每次收到写命令就立即强制写入磁盘,最慢的,但是保证完全的持久化,不推荐使用 appendfsync everysec //每秒钟强制写入磁盘一次,在性能和持久化方面做了很好的折中,推荐 # appendfsync no //完全依赖os,性能最好,持久化没保证

aof way also brought another problem. Persistent file will become bigger and bigger. For example, we call incr test command 100 times, the file must be saved all 100 commands, in fact, 99 are superfluous. To restore the database because the state actually save the file in a set test 100 is enough.

To compress files aof the persistence. redis provides bgrewriteaof command. Redis receive this command will use the snapshot a manner similar to the data in memory in a manner commands saved to a temporary file, and finally replace the original file. Specific process is as follows

  • redis calls fork, and his son now has two processes
  • According to the child process database snapshot memory, to a temporary file write command to rebuild the database state
  • The parent process to continue processing client requests, in addition to the write command to write to the original aof file. While the received write commands cached. This ensures that if the child process to rewrite fails and no problem.
  • When the child process to write the contents of the snapshot has been written to a temporary file in command mode, the child process signals the parent process. Then the parent cache write command is also written to a temporary file.
  • Now the parent process can use a temporary file to replace the old aof file, rename, write command received later began to aof a new file is added.

Need to be aware of is to rewrite aof file operations, and did not read the old aof file, but the contents of the entire database in memory rewrite command of aof a new file, and this snapshot is somewhat similar.

Advantage

  • AOF use Redis persistence will become very durable (much more durable): You can set different fsync strategies, such as no fsync, fsync once per second, or every fsync write command execution. AOF default policy for fsync once per second, in this configuration, Redis can still maintain good performance, and even if downtime occurs, it will only lose one second of data (fsync will be performed in the background thread, the main thread can continue its efforts to deal with command requests).

  • AOF file is a log file only append operations (append only log), so the files do not need to write AOF seek, even if for some reason the log contains the command does not write complete (such as writing to disk full, written half-way down, etc.), redis-check-aof tool can easily fix this problem.
    When Redis AOF may become too large file size, automatically in the background rewriting of AOF: AOF new file after overwriting command contains a minimal set of data required to restore the current collection. Rewrite the entire operation is absolutely safe, because in the process of creating a new Redis AOF file, the command will continue to append to an existing file AOF inside, even during downtime rewrite existing AOF files will not be lost . Once the new AOF file is created, Redis will switch from the old to the new file AOF AOF file and start the new AOF file append operation.

  • AOF save the file in an orderly manner all write operations performed on the database, the write operation to save the format Redis protocol, so the contents of AOF file is very easy to read people, to file for analysis (parse) is also very easy. Export (export) AOF file is very simple: For example, if you are not careful FLUSHALL execute the command, but as long as AOF file has not been overwritten, so long as the server stops, remove FLUSHALL command AOF end of the file, and restart Redis, you can restore the data set to the state before FLUSHALL execution.

Disadvantaged

  • For the same data sets, files generally AOF volume greater than the volume RDB file.

  • According to fsync strategy being used, AOF speed may be slower than RDB. In general, fsync per second performance is still very high, and close fsync allows AOF speed and RDB as fast, even under high loads. But when dealing with a huge write load, RDB can provide the maximum delay time (latency) is more assured.

  • AOF had such a bug in the past: because of individual commands, leading to AOF files when reloading, you can not recover the data set into as they were when saved. (For example, blocking command BRPOPLPUSH once caused such a bug.) Was added to the test suite tests for this situation: they will automatically generate random, complex data sets, and to ensure that all these data by reloading normal. Although this bug is not common in the AOF file, but the contrast is, RDB is almost impossible that the bug.

Choice

In general, if you want to achieve comparable PostgreSQL data security, you should use both persistent feature.

If you are very concerned about your data, but you can still withstand the loss of data within a few minutes, then you can just use RDB persistence.

The remaining cases my personal preferences AOF

 

three

1. Snapshotting:
By default, a snapshot of the data set will Redis to dump.rdb dump file. In addition, we can also be modified by the frequency Redis server configuration file dump snapshot, after opening 6379.conf file, we search save, you can see the following configuration:
the Save # 1 900 after 900 seconds (15 minutes), If at least one key change, the dump memory snapshots.
save 300 10 # after 300 seconds (5 minutes), at least if there is a change key 10, the dump memory snapshot.
save 60 10000 # after 60 seconds (1 minute), if at least 10,000 key changes, the dump memory snapshot.

2. Dump snapshot mechanism:
. 1) Redis to fork child process.
2) Sub process snapshot data is written to a temporary file RDB.
3). When the child process to complete the data write operation, a temporary file and then replace the old files.

5.4.3 AOF file:
As already said many times, RDB snapshots timed dump mechanisms can not guarantee a good data persistence. If our application is indeed very concerned about this point, we can consider the use of AOF mechanism Redis. For Redis server, the default mechanism is RDB, if need AOF, you need to modify the configuration file the following entry:
The appendonly no change appendonly yes
From now on, every time a command is received Redis data modification in after that, it will be appended to the AOF file. When a restart in Redis, you need to load information AOF file to build the latest data into memory.

5.4.5 AOF configuration:
three kinds of synchronous manner in the presence Redis profile, they are:
appendfsync always # every time a data modification occurs when a file is written to AOF.
appendfsync everysec # sync once per second, the strategy is the default strategy of AOF.
appendfsync no # Never synchronization. Efficient but the data are not persisted.

5.4.6 How to fix corrupted files AOF:
1) the existing file has been corrupted AOF out an extra copy.
2) Perform "redis-check-aof --fix < filename>" command to repair the corrupted file AOF.
3) Restart Redis server with AOF repaired file.

. 5.4.7 Redis data backup:
Redis Redis data file, we may be running by way of online backup copy. This is because the RDB file will not be modified once they are generated. Redis each time to dump the most recent data to a temporary file and then using rename function atomic temporary files will be renamed to the original data file name. So we can say that at any time copy data files are safe and consistent. In view of this, we can Redis regular backup of data files by creating a cron job the way, and copy the backup files to a secure disk media.

5.5, write immediately

Copy the code
Copy the code
// Save Now, saving synchronization 
    public static void syncSave () throws Exception { 
        Jedis Jedis jedis new new = ( "127.0.0.1", 6379); 
        for (int I = 0; I <1000; I ++) { 
            jedis.set ( " Key "+ I," the Hello "+ I); 
            System.out.println (" set key "+ i +" data to Redis "); 
            the Thread.sleep (2); 
        } 
        // save execution, is generated in the server a database file dump.rdb 
        jedis.save (); 
        jedis.close (); 
        System.out.println ( "write completion"); 
    }
Copy the code
Copy the code

operation result:

Here save method is synchronous, no write back is not performed before the completion code.

5.6, asynchronous writes

Copy the code
Copy the code
    // save the asynchronous 
    public static void asyncSave () throws Exception { 
        Jedis Jedis jedis new new = ( "127.0.0.1", 6379); 
        for (int I = 0; I <1000; I ++) { 
            jedis.set ( "Key" + I, "the Hello" + I); 
            System.out.println ( "set key" + i + "data to Redis"); 
            the Thread.sleep (2); 
        } 
        // asynchronous saved, it generates a dump at the server .rdb database file 
        jedis.bgsave (); 
        jedis.close (); 
        System.out.println ( "write completion"); 
    }
Copy the code
Copy the code

If the amount of data is very large, you want to save a lot of content, it is recommended to use bgsave, if the content can at least use the save method. Comparing the various ways from users blog.

 

1, a Redis persistence strategy: RDB snapshot  

Redis supports snapshot of the current data is saved as a persistence mechanism for data files . The database is how to generate a sustained write a snapshot of it. With a copy fork Redis commands on write mechanism. When generating a snapshot, the current process will fork a child process, then loop through all of the data in the child process, the data is written as RDB file. 

We can configure the timing RDB snapshots generated by the Redis save command, for example, you can configure when 10 minutes or less 100 writes to take a snapshot, you can configure when there are 1000 written on a snapshot in 1 hour, you can embodiment with a plurality of rules. These rules are defined in Redis configuration file, you can also set rules Redis Redis running through the CONFIG SET command, no need to restart Redis. 

Redis of RDB file is not broken, because of its write operation is performed in a new process, when generating a new RDB file, Redis generated child process will first write data to a temporary file, and then by atom of the system call rename the temporary file is renamed to RDB files so fails at any time, Redis of RDB files are always available. 

Meanwhile, the RDB file Redis Redis master-slave synchronization is implemented in the inside of a ring. 

However, we can clearly see, RDB has its shortcomings, once the database is a problem, then save the file in our RDB data is not entirely new , RDB file generated from the last stop to Redis data from this period all lost. In some businesses, this can be tolerated, we also recommend ways to use these services RDB be persistent, because the price is not high RDB open. But for some additional high data security requirements of the application, the application can not tolerate data loss, RDB can not do anything, so Redis introduces another important persistence mechanism: AOF log. 

2, Redis second persistence strategy: AOF log 

AOF is the full name of the log Append Only File, from the name we can tell, it is written to a log file append. The difference is that with the general database, AOF of recognizable file is plain text, its content is one of the standard Redis commands . For example, we carried out the following experiment, Redis2.6 version, set to open in the startup command AOF function parameters:

./redis-server --appendonly yes

  We then execute the following command:

redis 127.0.0.1:6379>  set  key1 Hello
OK
redis 127.0.0.1:6379> append key1  " World!"
(integer) 12
redis 127.0.0.1:6379> del key1
(integer) 1
redis 127.0.0.1:6379> del non_existing_key
(integer) 0

  Then we look at AOF log file, you will get the following:

$ cat appendonly.aof
*2
$6
SELECT
$1
0
*3
$3
set
$4
key1
$5
Hello
*3
$6
append
$4
key1
$7
  World!
*2
$3
del
$4
key1

  You can see, the write operation generates a corresponding command as a log. It is worth noting that last a del command, it has not been recorded in the log AOF, because Redis judge this command does not make changes to the current data set . There is no need to record the write command useless. Further AOF log is not entirely according to the client's request to generate a log, such as command INCRBYFLOAT the chronograph AOF logs have been recorded into a SET record, because floating point operations may be different in different systems, so to avoid with a log generate different sets of data on different systems, so here only the result of the recording operation by the sET. 

AOF rewrite 

you can think of you every single write command generates a log, then the file is not AOF will be great? The answer is yes, AOF documents will become increasingly large, so Redis also provides a feature called rewrite AOF of . Its function is to re-generate an AOF document, the new AOF file a record of the operation only once, rather than as an old document, may be recorded in a number of operations on the same values. RDB and its generation process is similar, but also fork a process directly through the data, write new AOF temporary files. In the process of writing a new file, all written or will write the log old AOF original file, while also recorded in the memory buffer. When the operation is complete re-finished, all buffers will be written to the log-time temporary file. Then call the atomic rename command with a new document to replace the old AOF AOF file. 

Two, Redis persistence performance is reliable?  

We can see from the above process, RDB is the order of IO operations, performance is very high. While at the same time to recover from the RDB database file, the read data sequence is loaded into memory. So it will not cause random disk read errors. 

The AOF is a write file operation, its purpose is to operate the log to disk, so it will also say that we encountered above write operation five processes. So write AOF operational safety but also how much of it? In fact, this can be set in the Redis to write after writing AOF call, when to call fsync be written to disk is controlled by appendfsync option, the following three settings appendfsync, the security strength has been strong .  

1, appendfsync no 

When set appendfsync to no time, Redis will not take the initiative to call fsync AOF log sync to disk, so all of this is completely dependent on the debugging of the operating system. For most Linux operating system, is carried out every 30 seconds fsync, the data buffer is written to disk

2, appendfsync everysec 

when the set time appendfsync to everysec, Redis will fsync default once every second call, the data buffer is written to disk. But when this time fsync call longer than 1 second. Redis will fsync adopted strategy of delay, wait a second. Fsync is carried out in about 2 seconds after, this time fsync will perform no matter how long it will be. This time because the file descriptor will be blocked when the fsync, so the current write operation will be blocked. 

Therefore, the conclusion is: In most cases, Redis will be carried out once every second fsync. In the worst case, two seconds will perform an fsync operation

This operation is referred to in most database systems group commit, it is a combination of multiple data write operation, a one-time write logs to disk. 

3, appednfsync always 

When set appendfsync is always, every write operation will be called once fsync, then the data is the safest, of course, since every time the implementation of fsync, so its performance will be affected. 

For pipelining What is the difference?  

For the operation of pipelining, which is a particular client process sends N-time command, and then waits for the command to return the N results are returned together. By using pipilining it would mean giving up the confirmation of the return value of each command. Since in this case, N is the command executed in the same execution. When the set appendfsync to everysec, there may be some deviation, because it is possible to perform N commands for more than 1 seconds or 2 seconds. But we can guarantee that will not exceed the maximum time the execution time and the N command. 

Third, the comparison of databases and other 

data security operating system level above, we have talked a lot, in fact, different databases in achieving much the same. In short, the final conclusion is that, in the case of Redis open AOF, the stand-alone data security is not weaker than those of mature SQL database

Comparison of the data in terms of import of 

these data persistence What's the use, of course, it is for data recovery after the restart. Redis is an in-memory database, whether or RDB AOF, just to ensure that their data recovery measures. So Redis in the use of RDB and AOF recovery, will read RDB or AOF file, re-loaded into memory. MySQL database and other start-up time is relative, a lot of the president, because MySQL would have been no need to load data into memory. 

But relatively speaking, the provision of services after startup MySQL, which is accessed hot data will gradually loaded into memory, we usually call it warm-up, but before the completion of warming up, its performance will not be too high. The advantage of Redis is a one-time load data into memory, the one-time warm-up. So long as Redis startup is complete, then the speed of service is very fast. 

And in the use of RDB and start using the AOF, its start time there are some differences. RDB's start-up time will be shorter, there are two reasons, first, RDB file only one record for each piece of data, it will not be as likely to have multiple operations recorded a data log as AOF. So just write data once each on the line. Another reason is the storage format and the data encoding format Redis RDB file in memory is the same, no further data coding. On the CPU consumption is far less than the load AOF log.

 

Reproduced in bcombetter the persistence mechanism provided by Redis (RDB and AOF)

Guess you like

Origin www.cnblogs.com/it-xiaozhi/p/10955849.html