Detailed explanation of the difference between Redis and Memcached

Salvatore Sanfilippo, the author of Redis, once compared these two memory -based data storage systems:

1. Redis supports server-side data operations: Compared with Memcached, Redis has more data structures and supports richer data operations. Usually in Memcached, you need to get the data to the client to make similar modifications and then set go back. This greatly increases the number of network IOs and data volume . In Redis, these complex operations are usually as efficient as normal GET/SET. Therefore, if the cache needs to support more complex structures and operations , then Redis will be a good choice.

2. Comparison of memory usage efficiency: If simple key-value storage is used, the memory utilization of Memcached is higher . If Redis uses hash structure for key-value storage , its memory utilization will be higher due to its combined compression . on Memcached.

3. Performance comparison: Since Redis only uses a single core , and Memcached can use multiple cores , on average, Redis has higher performance than Memcached when storing small data on each core. For data above 100k , Memcached's performance is higher than Redis . Although Redis has recently optimized the performance of storing large data, it is still slightly inferior to Memcached.

Specifically why the above conclusions appear, the following is the collected data:

1. Different data types support

Unlike Memcached, which only supports data records with a simple key-value structure, Redis supports much richer data types. The most commonly used data types are mainly composed of five types: String, Hash, List, Set and Sorted Set. Redis internally uses a redisObject object to represent all keys and values. The main information of redisObject is shown in the figure:

type represents the specific data type of a value object, and encoding is the storage method of different data types in redis. For example: type=string represents that value stores a common string, then the corresponding encoding can be raw or int, If it is int, it means that the actual redis internally stores and represents the string according to the numerical type. Of course, the premise is that the string itself can be represented by a numerical value, such as: "123" "456" such a string. Only when the virtual memory function of Redis is turned on, the vm field will actually allocate memory, which is turned off by default.

1)String

Common commands: set/get/decr/incr/mget, etc.;
application scenarios: String is the most commonly used data type, and ordinary key/value storage can be classified into this category;
implementation method: String is stored in redis by default as A string that is referenced by redisObject. When incr, decr and other operations are encountered, it will be converted into a numeric type for calculation. At this time, the encoding field of redisObject is int.

2)Hash

Common commands: hget/hset/hgetall and other
application scenarios: we want to store a user information object data, including user ID, user name, age and birthday, through the user ID we hope to get the user's name or age or birthday;
implementation method : The Hash of Redis is actually a HashMap that stores the value internally, and provides an interface to directly access the members of this Map. As shown in the figure, Key is the user ID, and value is a Map. The key of this Map is the attribute name of the member, and the value is the attribute value. In this way, the data can be modified and accessed directly through the key of its internal map (the key of the internal map is called field in Redis), that is, the corresponding attribute data can be manipulated through key (user ID) + field (attribute label). There are currently two ways to implement HashMap: when the number of HashMap members is relatively small, Redis will use a one-dimensional array-like method for compact storage in order to save memory, instead of using the real HashMap structure. At this time, the encoding of the redisObject of the corresponding value is zipmap, when the number of members increases, it will be automatically converted into a real HashMap, and the encoding is ht at this time.

3)List

Common commands: lpush/rpush/lpop/rpop/lrange, etc.;
Application scenarios: Redis list has many application scenarios, and it is also one of the most important data structures of Redis. For example, twitter follow list, fan list, etc. can use Redis list Structure to achieve;
implementation method: Redis list is implemented as a doubly linked list, that is, it can support reverse search and traversal, which is more convenient to operate, but it brings some additional memory overhead. Many internal implementations of Redis, including sending buffer queues, etc. This data structure is also used.

4)Set

Common commands: sadd/spop/smembers/sunion, etc.;
application scenario: Redis set provides external functions similar to list, which is a list function. The special feature is that set can automatically arrange weights. When you need to store a list of data, When you don't want duplicate data, set is a good choice, and set provides an important interface for judging whether a member is in a set collection, which is also not provided by list;
implementation method: The internal implementation of set is A HashMap whose value is always null is actually a way to quickly arrange the weight by calculating the hash, which is why set can provide a judgment on whether a member is in the set.

5)Sorted Set

Common commands: zadd/zrange/zrem/zcard, etc.;
Application scenario: The usage scenario of Redis sorted set is similar to that of set, the difference is that set is not automatically ordered, and sorted set can provide an additional priority (score) parameter through the user To sort the members, and it is insertion-ordered, that is, automatic sorting. When you need an ordered and non-repeating set list, you can choose the sorted set data structure. For example, the public timeline of twitter can be stored with the publication time as the score, so that it is automatically sorted by time when it is obtained.
Implementation method: Redis sorted set internally uses HashMap and SkipList to ensure the storage and ordering of data. HashMap stores the mapping from members to scores, and the skip list stores all members. The sorting is based on For the score stored in HashMap, the structure of the jump table can be used to obtain relatively high search efficiency, and it is relatively simple to implement.

2. Different memory management mechanisms

In Redis, not all data is always stored in memory. This is the biggest difference compared to Memcached. When physical memory runs out, Redis can swap some values ​​that have not been used for a long time to disk. Redis will only cache all key information. If Redis finds that the memory usage exceeds a certain threshold, it will trigger the swap operation. Redis calculates which key corresponding value needs to be based on "swappability = age*log(size_in_memory)" swap to disk. Then the values ​​corresponding to these keys are persisted to disk and cleared in memory at the same time. This feature allows Redis to hold data that exceeds the size of its machine's own memory. Of course, the memory of the machine itself must be able to hold all the keys, after all, the data will not be swapped. At the same time, when Redis swaps the data in memory to the disk, the main thread providing the service and the sub-thread performing the swap operation will share this part of the memory, so if the data that needs to be swapped is updated, Redis will block this operation until the sub-thread Modifications can be made after the swap operation is completed. When reading data from Redis, if the value corresponding to the read key is not in memory, then Redis needs to load the corresponding data from the swap file, and then return it to the requester. There is an I/O thread pool problem here. By default, Redis will block, that is, it will respond after all swap files are loaded. This strategy is more suitable when the number of clients is small and batch operations are performed. But if Redis is applied to a large-scale website application, this obviously cannot satisfy the situation of large concurrency. Therefore, when Redis runs, we set the size of the I/O thread pool, and perform concurrent operations on the read requests that need to load the corresponding data from the swap file to reduce the blocking time.

For memory-based database systems like Redis and Memcached, the efficiency of memory management is a key factor affecting system performance. The malloc/free function in the traditional C language is the most commonly used method for allocating and releasing memory, but this method has great drawbacks: first, for developers, mismatched malloc and free can easily cause memory leaks; secondly Frequent calls will cause a large number of memory fragments that cannot be recycled and reused, reducing memory utilization; finally, as a system call, the system overhead is much greater than that of general function calls. Therefore, in order to improve the efficiency of memory management, efficient memory management solutions do not directly use malloc/free calls. Both Redis and Memcached use their own memory management mechanisms, but the implementation methods are very different. The following will introduce the memory management mechanisms of the two respectively.

Memcached uses the Slab Allocation mechanism to manage memory by default. The main idea is to divide the allocated memory into blocks of a specific length according to a predetermined size to store key-value data records of the corresponding length, so as to completely solve the problem of memory fragmentation. The Slab Allocation mechanism is only designed to store external data, that is to say, all key-value data are stored in the Slab Allocation system, while other memory requests of Memcached are applied for through ordinary malloc/free, because the number of these requests and The frequency determines that they will not affect the performance of the entire system. The principle of Slab Allocation is quite simple. As shown in the figure, it first requests a large chunk of memory from the operating system, divides it into chunks of various sizes, and divides chunks of the same size into groups of Slab Classes. Among them, Chunk is the smallest unit used to store key-value data. The size of each Slab Class can be controlled by specifying the Growth Factor when Memcached is started. Assuming that the value of the Growth Factor in the figure is 1.25, if the size of the first group of Chunks is 88 bytes, the size of the second group of Chunks is 112 bytes, and so on.

When Memcached receives the data sent by the client, it will first select the most suitable Slab Class according to the size of the received data, and then by querying the list of free Chunks in the Slab Class saved by Memcached, you can find a data that can be used to store data. Chunk. When a database expires or is discarded, the chunk occupied by the record can be recycled and added to the free list again. From the above process, we can see that Memcached's memory management system is efficient and will not cause memory fragmentation, but its biggest disadvantage is that it will lead to a waste of space. Because each Chunk is allocated a specific length of memory space, variable-length data cannot make full use of this space. As shown in the figure, cache 100 bytes of data into a 128-byte chunk, and the remaining 28 bytes are wasted.

The memory management of Redis is mainly implemented through the two files zmalloc.h and zmalloc.c in the source code. In order to facilitate memory management, after allocating a piece of memory, Redis will store the size of the memory in the head of the memory block. As shown in the figure, real_ptr is the pointer returned by redis after calling malloc. Redis stores the size of the memory block in the header. The memory size occupied by size is known, which is the length of the size_t type, and then returns ret_ptr. When memory needs to be freed, ret_ptr is passed to the memory manager. Through ret_ptr, the program can easily calculate the value of real_ptr, and then pass real_ptr to free to release memory.

3. Data persistence support

Although Redis is a memory-based storage system, it itself supports the persistence of in-memory data, and provides two main persistence strategies: RDB snapshots and AOF logs. Memcached does not support data persistence operations.

1) RDB snapshot

Redis supports a persistence mechanism that saves a snapshot of the current data as a data file, that is, an RDB snapshot. But how does a continuously writing database generate snapshots? Redis relies on the copy on write mechanism of the fork command. When generating a snapshot, fork the current process out of a child process, then loop all the data in the child process, and write the data as an RDB file. We can configure the timing of RDB snapshot generation through the save command of Redis. For example, it can be configured to generate a snapshot in 10 minutes, or it can be configured to generate a snapshot after 1000 writes, or multiple rules can be implemented together. These rules are defined in the Redis configuration file. You can also set the rules when Redis is running through the Redis CONFIG SET command without restarting Redis.
The RDB file of Redis will not be damaged, because its write operation is performed in a new process. When a new RDB file is generated, the subprocess generated by Redis will first write the data to a temporary file, and then use the atomic The rename system call renames a temporary file to an RDB file, so that in the event of a failure, the RDB file for Redis is always available. At the same time, the RDB file of Redis is also a part of the internal implementation of Redis master-slave synchronization. RDB has its shortcomings, that is, once there is a problem with the database, the data saved in our RDB file is not brand new, and all the data from the last RDB file generation to the time when Redis was down is lost. Under some businesses, this is tolerable.

2) AOF log

The full name of the AOF log is append only file, which is a log file that is appended to. Different from the binlog of the general database, the AOF file is recognizable plain text, and its content is the standard Redis commands one by one. Only those commands that would result in data modification are appended to the AOF file. Each command to modify data generates a log, and the AOF file will become larger and larger, so Redis provides another function called AOF rewrite. Its function is to regenerate an AOF file. The operation of a record in the new AOF file can only be performed once, unlike an old file, which may record multiple operations on the same value. The generation process is similar to RDB. It is also a fork process, which directly traverses the data and writes a new AOF temporary file. In the process of writing a new file, all write operation logs will still be written to the original old AOF file, and will also be recorded in the memory buffer. When the redo operation is completed, the logs in all buffers will be written to the temporary file at one time. Then call the atomic rename command to replace the old AOF file with the new AOF file.
AOF is a file write operation whose purpose is to write the operation log to disk, so it will also encounter the process of the write operation we mentioned above. After calling write to AOF in Redis, the appendfsync option is used to control the time of calling fsync to write it to the disk. The following three setting items of appendfsync gradually increase the security strength.

1. appendfsync no When appendfsync is set to no, Redis will not actively call fsync to synchronize the AOF log content to disk, so it all depends on the debugging of the operating system. For most Linux operating systems, an fsync occurs every 30 seconds to write the data in the buffer to disk.

appendfsync everysec When appendfsync is set to everysec, Redis will make an fsync call every second by default to write the data in the buffer to disk. But when this time the fsync call is longer than 1 second. Redis will adopt a strategy of delaying fsync and wait another second. That is, 2.fsync is performed after two seconds, and this time the fsync will be performed no matter how long it takes to execute. At this time, since the file descriptor will be blocked during fsync, the current write operation will be blocked. So the conclusion is that in the vast majority of cases, Redis will fsync every second. In the worst case, an fsync operation occurs every two seconds. This operation is called group commit in most database systems, which combines the data of multiple write operations and writes the log to disk at one time.

3.appednfsync always When appendfsync is set to always, fsync will be called once for each write operation. At this time, the data is the safest. Of course, since fsync is executed every time, its performance will also be affected.

For general business requirements, it is recommended to use RDB for persistence. The reason is that the overhead of RDB is much lower than that of AOF logs. For applications that cannot tolerate data loss, it is recommended to use AOF logs.

4. Differences in cluster management

Memcached is a full-memory data buffering system. Although Redis supports data persistence, full memory is the essence of its high performance after all. As a memory-based storage system, the size of the physical memory of the machine is the maximum amount of data that the system can hold. If the amount of data to be processed exceeds the physical memory size of a single machine, it is necessary to build a distributed cluster to expand the storage capacity.

Memcached itself does not support distribution, so the distributed storage of Memcached can only be implemented on the client side through distributed algorithms such as consistent hashing. The following figure shows the distributed storage implementation architecture of Memcached. Before the client sends data to the Memcached cluster, it first calculates the target node of the piece of data through the built-in distributed algorithm, and then the data is directly sent to the node for storage. However, when the client queries data, it also needs to calculate the node where the query data is located, and then directly sends a query request to the node to obtain the data.

Compared with Memcached, which can only implement distributed storage on the client side, Redis prefers to build distributed storage on the server side. The latest version of Redis already supports distributed storage. Redis Cluster is an advanced version of Redis that implements distributed and single point of failure. It has no central node and has linear scalability. The following figure shows the distributed storage architecture of Redis Cluster, in which the nodes communicate with each other through the binary protocol, and the nodes and clients communicate through the ascii protocol. In terms of data placement strategy, Redis Cluster divides the entire key value field into 4096 hash slots, and each node can store one or more hash slots, which means that the current maximum number of nodes supported by Redis Cluster is 4096. The distributed algorithm used by Redis Cluster is also very simple: crc16(key) % HASH_SLOTS_NUMBER.

To ensure data availability under a single point of failure, Redis Cluster introduces a Master node and a Slave node. In Redis Cluster, each Master node will have two corresponding Slave nodes for redundancy. In this way, in the entire cluster, the downtime of any two nodes will not cause data unavailability. When the Master node exits, the cluster will automatically select a Slave node to become the new Master node.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324647995&siteId=291194637