Several misunderstandings of Redis

Several misunderstandings of Redis

 

 

Redis can't possibly be faster than Memcache

 

      Many developers think that Redis cannot be faster than Memcached. Memcached is completely based on memory, and Redis has the characteristics of persistent storage. Even if it is asynchronous, Redis cannot be faster than Memcached. But the test results are basically that Redis has an absolute advantage. I've been thinking about this for a while, and the reasons that come to my mind are as follows.

 

  • Libevent. Unlike Memcached, Redis does not choose libevent. In order to cater to the generality, Libevent results in a huge code (currently Redis code is less than 1/3 of libevent) and sacrifices a lot of performance on specific platforms. Redis implements its own epoll event loop (4) by modifying two files in libevent. Many developers in the industry also suggest that Redis use another libevent to replace libev with high performance, but the author still insists that Redis should be small and not dependent. An impressive detail is that ./configure does not need to be executed before compiling Redis.
  • CAS problem. CAS is a convenient way to prevent contention from modifying resources in Memcached. The CAS implementation needs to set a hidden cas token for each cache key. The cas is equivalent to the value version number. The token needs to be incremented each time the set is set, so it brings double overhead of CPU and memory. Although these overheads are small, it can be used for a single 10G+ cache. And after the QPS is tens of thousands, these overheads will bring some slight performance differences to both sides (5).

 

 

 

The data stored in a single Redis must be smaller than the physical memory

 

      All Redis data is placed in memory, which brings high-speed performance, but also brings some unreasonableness. For example, a medium-sized website has 1 million registered users. If these data are to be stored in Redis, the memory capacity must be able to accommodate these 1 million users. However, the actual situation of the business is that there are only 50,000 active users for 1 million users, and only 150,000 users who have visited once a week. Therefore, it is unreasonable to store all the data of 1 million users in the memory, and the RAM needs to be cold data. buy order.

 

      This is very similar to the operating system. All the data accessed by the operating system applications are in memory, but if the physical memory cannot accommodate new data, the operating system will intelligently swap some data that has not been accessed for a long time to the disk, leaving space for new applications. . What modern operating systems provide to applications is not physical memory, but the concept of virtual memory.

 

      Based on the same considerations, Redis 2.0 also adds VM features. Let Redis data capacity break through the limitation of physical memory. And realize the separation of hot and cold data.

 

 

 

 

Redis's VM implementation is reinventing the wheel

 

      The VM of Redis is still implemented by itself according to the previous epoll implementation idea. However, in the previous introduction to the operating system, it was mentioned that the OS can also automatically help the program to separate hot and cold data. Redis only needs the OS to apply for a large memory, the OS will automatically put the hot data into the physical memory, and the cold data will be exchanged to the hard disk. Another well-known Varnish's "understood modern operating systems (3)" is just that, and it has achieved very successful results.

 

      The author antirez mentions several reasons (6) in his explanation of why to implement the VM yourself. The main OS's VM swapping in and out is based on the concept of Page. For example, OS VM1 Page is 4K. As long as there is one element in 4K, even if only 1 byte is accessed, this page will not be SWAPed, and the same is true for swapping in. , reading a single byte may swap into 4K of useless memory. And Redis's own implementation can control the granularity of swap-in. In addition, blocking the process when accessing the SWAP memory area of ​​the operating system is also one of the reasons why Redis needs to implement VM by itself.

 

 

 

 

Using Redis with get/set

 

      As a key value exists, many developers naturally use the set/get method to use Redis. In fact, this is not the optimal use method. Especially when the VM is not enabled, all Redis data needs to be put into memory, and it is especially important to save memory.

 

      If a key-value unit needs to occupy a minimum of 512 bytes, even if only one byte is stored, it will occupy 512 bytes. At this time, there is a design pattern that can reuse keys, put several key-values ​​into one key, and store the value as a set, so that the same 512 bytes will store 10-100 times the capacity.

 

      This is to save memory. It is recommended to use hashset instead of set/get to use Redis. For details, see Reference (7).

 

 

 

 

Use aof instead of snapshot

 

      Redis has two storage methods. The default is the snapshot method. The implementation method is to periodically persist a snapshot of the memory to the hard disk. The disadvantage of this method is that if a crash occurs after persistence, a piece of data will be lost. Therefore, under the impetus of perfectionists, the author added the aof method. aof is append only mode, which saves the operation commands to the log file while writing the memory data. In a system with tens of thousands of concurrent changes, the command log is a very large amount of data, the management and maintenance costs are very high, and the recovery and reconstruction time will be reduced. Very long, which leads to the loss of the original intention of aof high availability. More importantly, Redis is an in-memory data structure model, and all its advantages are based on efficient atomic operations on complex in-memory data structures, so it can be seen that aof is a very uncoordinated part.

 

      In fact, the purpose of aof is mainly data reliability and high availability. There is another way to achieve this in Redis: Replication. Due to the high performance of Redis, there is basically no delay in replication. This achieves prevention of single point of failure and high availability.

 

 

 

 

references

1. On Designing and Deploying Internet-Scale Service(PDF)

2. Facebook’s New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month

3. What’s wrong with 1975 programming

4. Linux epoll is now supported(Google Groups)

5. CAS and why I don’t want to add it to Redis(Google Groups)

6. Plans for Virtual Memory(Google Groups)

7. Full of keys(Salvatore antirez Sanfilippo)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326189281&siteId=291194637