(transfer) redis memory optimization and storage methods

 

Redis common data types

 

There are five most commonly used data types in Redis:

  • String

  • Hash

  • List

  • Set

  • Sorted set

 

Before describing these data types in detail, let's first use a picture to understand how these different data types are described in Redis internal memory management:



 

 

First, Redis internally uses a redisObject object to represent all keys and values. The main information of redisObject is shown in the figure above. Type represents the specific data type of a value object. Encoding is the storage method of different data types in Redis. For example: type=string represents a common string stored in value, then the corresponding encoding can be raw or int. If it is int, it means that the actual Redis internal storage and representation of the string is based on a numerical class, of course, the premise is that the string itself can be represented by a numerical value, such as strings such as "123" "456".

 

The vm field needs to be specially explained here. Only when the virtual memory function of Redis is turned on, this field will actually allocate memory. This function is disabled by default, and this function will be described in detail later.

 

From the above figure, we can find that it is a waste of memory for Redis to use redisObject to represent all key/value data. Of course, these memory management costs are mainly to provide a unified management interface for different data types of Redis. The actual author also provides There are many ways to help us save memory usage as much as possible, which will be discussed in detail later.

 

Let's first analyze the use and internal implementation of these five data types one by one:

 

String

 

Common commands:

Set、get、decr、incr、mget 等。

 

Application scenarios:

String is the most commonly used data type, and ordinary key/value storage can be classified into this category, which will not be explained here.

 

Method to realize:

String is stored in Redis as a string by default, which is referenced by redisObject. When incr, decr and other operations are encountered, it will be converted into a numeric type for calculation. At this time, the encoding field of redisObject is int.

 

Hash

 

Common commands:

Hget、hset、hgetall 等。

 

Application scenarios:

Let's take a simple example to describe the application scenario of Hash. For example, we want to store a user information object data, including the following information:

 

The user ID is the search key, and the stored value user object contains information such as name, age, birthday, etc. If it is stored in an ordinary key/value structure, there are mainly the following two storage methods:

 



 

 

The first method uses the user ID as the search key, and encapsulates other information into an object and stores it in a serialized manner. The disadvantage of this method is that it increases the overhead of serialization/deserialization, and it is necessary to modify one of the items. When the information is retrieved, the entire object needs to be retrieved, and the modification operation needs to protect the concurrency and introduce complex problems such as CAS.

 



 

 

The second method is to store as many key-value pairs as the user information object has members, and use the user ID + the name of the corresponding attribute as the unique identifier to obtain the value of the corresponding attribute, although serialization overhead and concurrency are eliminated. Problem, but the user ID is stored repeatedly. If there is a large amount of such data, the memory waste is still very considerable.

 

Then the Hash provided by Redis solves this problem very well. The Hash of Redis is actually a HashMap with the internally stored Value, and provides an interface to directly access the members of this Map, as shown in the following figure:

 



 

 

That is to say, the key is still the user ID, the value is a map, the key of this map is the attribute name of the member, and the value is the attribute value, so that the modification and access to the data can be directly passed through the key of its internal map (in Redis). The key of the internal Map is called field), that is, the corresponding attribute data can be manipulated through key (user ID) + field (attribute label), which neither requires repeated data storage nor brings serialization and concurrent modification control. problem, solved the problem nicely.

 

At the same time, it should be noted that Redis provides an interface (hgetall) that can directly fetch all attribute data, but if there are many members of the internal map, it involves the operation of traversing the entire internal map. Due to the single-threaded model of Redis, this traversal operation It may be time-consuming, and the requests of other clients do not respond at all, which requires special attention.

 

Method to realize:

As mentioned above, the corresponding value of Redis Hash is actually a HashMap. In fact, there will be two different implementations here. When the members of this Hash are relatively small, Redis will use a one-dimensional array-like method for compact storage in order to save memory, instead of using real HashMap structure, the encoding of the corresponding value redisObject is zipmap, when the number of members increases, it will be automatically converted into a real HashMap, and the encoding is ht at this time.

 

List

 

Common commands:

Lpush, rpush, lpop, rpop, lrange, etc.

 

Application scenarios:

There are many application scenarios of Redis list, and it is also one of the most important data structures of Redis. For example, the follow list of twitter, the list of fans, etc. can be implemented by the list structure of Redis, which is easy to understand and will not be repeated here.

 

Method to realize:

The implementation of Redis list is a doubly linked list, that is, it can support reverse search and traversal, which is more convenient to operate, but it brings some additional memory overhead. Many internal implementations of Redis, including sending buffer queues, also use this data structure.

 

Set

 

Common commands:

Sadd、spop、smembers、sunion 等。

 

Application scenarios:

The function provided by Redis set to the outside world is similar to that of list, which is a list function. The special feature is that set can be automatically reordered. When you need to store a list of data and do not want duplicate data, set is a good choice. , and set provides an important interface for judging whether a member is in a set collection, which is also not provided by list.

 

Method to realize:

The internal implementation of set is a HashMap whose value is always null. In fact, it is used to quickly arrange weights by calculating the hash. This is why set can provide judgment whether a member is in the set.

 

Sorted set

 

Common commands:

zadd、zrange、zrem、zcard等。

 

scenes to be used:

The usage scenario of Redis sorted set is similar to that of set, the difference is that set is not automatically sorted, while sorted set can sort members by providing an additional parameter of priority (score) by the user, and it is inserted in order, that is, automatic sorting . When you need an ordered and non-repeating set list, you can choose the sorted set data structure. For example, the public timeline of twitter can use the publication time as the score to store, so that it is automatically sorted by time when it is obtained.

 

Method to realize:

The interior of Redis sorted set uses HashMap and SkipList to ensure the storage and ordering of data. HashMap stores the mapping from members to scores, while the skip list stores all members, and the sorting is based on the data stored in HashMap. The score, using the structure of the jump table can obtain relatively high search efficiency, and is relatively simple in implementation.

 

Common memory optimization methods and parameters

 

Through our analysis of some implementations above, we can see that the actual memory management cost of Redis is very high, that is, it occupies too much memory. The author is very clear about this, so a series of parameters and means are provided to control and Save memory, let's talk about it separately.

 

First of all, the most important point is not to enable the VM option of Redis, that is, the virtual memory function. This was originally a persistence strategy for Redis to store data beyond physical memory in memory and disk, but its memory management The cost is also very high, and we will analyze that this persistence strategy is not mature, so to turn off the VM function, please check that vm-enabled is no in your redis.conf file.

 

Secondly, it is best to set the maxmemory option in redis.conf. This option tells Redis how much physical memory is used to start rejecting subsequent write requests. This parameter can well protect your Redis from using it. Excessive physical memory leads to swap, which eventually seriously affects performance or even crashes.

 

In addition, Redis provides a set of parameters for different data types to control memory usage. We have analyzed in detail before that Redis Hash is a HashMap inside the value. If the number of members of the Map is relatively small, it will use a compact one-dimensional linear Format to store the Map, which saves the memory overhead of a lot of pointers. This parameter controls the following two items in the redis.conf configuration file:

hash-max-zipmap-entries 64

hash-max-zipmap-value 512

hash-max-zipmap-entries

 

The meaning is that when there are no more than how many members in the value map, it will be stored in a linear compact format. The default value is 64, that is, if there are less than 64 members in the value, the linear compact storage is used. If the value exceeds this value, it will be automatically converted into a real HashMap.

 

The meaning of hash-max-zipmap-value is that when the length of each member value in the value map does not exceed a few bytes, it will use linear compact storage to save space.

 

If any of the above two conditions exceeds the set value, it will be converted into a real HashMap, and it will not save memory. So is this value set as large as possible? Of course, the answer is no. The advantage of HashMap is to find and The time complexity of the operation is O(1), and the time complexity of abandoning Hash and using one-dimensional storage is O(n). If the number of members is small, it will not affect the performance. Weighing the setting of this value is generally the most fundamental trade-off between time cost and space cost.

 

There are also similar parameters:

list-max-ziplist-entries 512

 

Description: The number of nodes below the list data type will use a compact storage format without pointers.

list-max-ziplist-value 64

 

Description: The size of the node value of the list data type is less than how many bytes will be used in the compact storage format.

set-max-intset-entries 512

 

Note: If the internal data of the set data type is all numeric, and the number of nodes below it is contained, it will be stored in a compact format.

 

The last thing I want to say is that the internal implementation of Redis has not done too much optimization on memory allocation, and there will be memory fragmentation to a certain extent, but in most cases this will not become the performance bottleneck of Redis, but if the internal storage of Redis is large If part of the data is numeric, Redis uses a shared integer method internally to save the overhead of allocating memory, that is, when the system starts, allocate a range from 1 to n, then multiple numeric objects are placed in a pool, if stored The data is exactly the data within this value range, then the object is directly taken from the pool and shared by reference counting, so that when the system stores a large number of values, it can also save memory and improve performance to a certain extent. The setting of the parameter value n needs to modify a line of macro definition REDIS_SHARED_INTEGERS in the source code. The default value is 10000. You can modify it according to your own needs, and you can recompile after modification.

 

Persistence mechanism of Redis

 

Since Redis supports very rich types of memory data structures, how to persist these complex memory organization methods to disk is a difficult problem. Therefore, the persistence method of Redis is quite different from that of traditional databases. Redis supports a total of four types. Persistence methods are:

 

  • Timed snapshot mode (snapshot)

  • Append files based on statements (aof)

  • virtual memory (vm)

  • Diskstore method

 

In terms of design ideas, the first two are based on the fact that all data is in memory, that is, the disk landing function is provided under a small amount of data, while the latter two methods are when the author tries to store data that exceeds the physical memory, that is, data storage with a large amount of data , As of this article, the last two persistence methods are still in the experimental stage, and the vm method has basically been abandoned by the author, so only the first two can actually be used in the production environment. In other words, Redis can only be used as small data at present. Mass storage (all data can be loaded in memory), mass data storage is not the area that Redis is good at. These persistence methods are described below:

 

Timed snapshot mode (snapshot):

 

This persistence method is actually a timer event inside Redis. It checks whether the number of changes and the time of the current data meet the configured persistence triggering conditions at regular intervals. If so, it creates a fork call through the operating system. The child process, this child process will share the same address space with the parent process by default. At this time, the child process can traverse the entire memory for storage operations, while the main process can still provide services. When there is a write, the operating system Copy-on-write is performed in units of memory pages to ensure that parent and child processes do not affect each other.

 

The main disadvantage of this persistence is that timed snapshots only represent memory images over a period of time, so system restarts will lose all data between the last snapshot and the restart.

 

Based on the statement append method (aof):

 

The aof method is actually similar to MySQL's statement-based binlog method, that is, each command that changes the Redis memory data will be appended to a log file, which means that the log file is the persistent data of Redis.

 

The main disadvantage of the aof method is that appending the log file may cause the volume to be too large. When the system restarts and restores the data, the data loading will be very slow if the aof method is used. It may take several hours to load tens of gigabytes of data. Of course, this consumes It is not because the disk file reading speed is slow, but because all the commands read are executed in memory. In addition, since each command has to write log, the read and write performance of Redis will also be degraded by using aof.

 

Virtual memory mode:

 

The virtual memory method is a strategy for Redis to swap data in and out of user space. This method is relatively ineffective in implementation. The main problems are complex code, slow restart, slow replication, etc., which has been abandoned by the author.

 

diskstore  method:

 

The diskstore method is a new implementation method chosen by the author after abandoning the virtual memory method, that is, the traditional B-tree method. It is still in the experimental stage, and we can wait and see whether it will be available in the future.

 

Redis persistent disk IO method and its problems

 

People who have experience in online operation and maintenance of Redis will find that Redis uses a lot of physical memory, but when it does not exceed the total physical memory capacity, instability or even crashes will occur. Some people think that it is a persistent fork system based on snapshots. This view is inaccurate because the call causes the memory usage to double, because the copy-on-write mechanism of the fork call is based on the operating system page unit, that is, only the dirty pages that have been written will be copied, but Generally, your system will not write all pages in a short period of time to cause replication, so what causes Redis to crash?

 

The answer is that the persistence of Redis is caused by the use of Buffer IO. The so-called Buffer IO refers to the Page Cache of physical memory that Redis writes and reads to persistent files, and most database systems use Direct IO to bypass it. This layer of Page Cache maintains a data cache by itself, and when the persistent file of Redis (especially the snapshot file) is too large and is read and written, the data in the disk file will be loaded into physical memory as an operation The system is a layer of Cache for the file, and the data of this layer of Cache and the data managed in the Redis memory are actually stored repeatedly. Although the kernel will do the culling of the Page Cache when the physical memory is tight, the kernel is likely to think that a certain block Page Cache is more important, and let your process start Swap, then your system will start to be unstable or crash. Our experience is that when your Redis physical memory usage exceeds 3/5 of the total memory capacity, it starts to be dangerous.

 

The following figure is the memory data map of Redis after reading or writing the snapshot file dump.rdb:

 



 

 

Summarize

 

  1. Select the appropriate data type according to business needs, and set the corresponding compact storage parameters for different application scenarios.

  2. When business scenarios do not require data persistence, closing all persistence methods can achieve the best performance and maximum memory usage.

  3. If you need to use persistence, choose between snapshot mode and statement append mode according to whether you can tolerate restarting to lose some data. Do not use virtual memory or diskstore mode.

  4. Do not let the physical memory usage of the machine where your Redis is located exceeds 3/5 of the total actual memory.

         

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326258549&siteId=291194637