Redis cache penetration cache breakdown cache avalanche introduction and solution

I. Introduction

In our daily development, we all use databases for data storage. Since there is usually no high concurrency in general system tasks, it seems that there is no problem with this, but once it involves the demand for large amounts of data , For example, when some products are snapped up, or when the number of visits to the home page is large in an instant, the system that only uses the database to store data will have serious performance problems due to the problem of disk-oriented and slow disk read/write speed. The arrival of thousands of requests requires the system to complete thousands of read/write operations in a very short period of time. At this time, the database is often not able to bear it, and it is extremely easy to cause the database system to be paralyzed and eventually lead to service downtime. Serious production problem.

In order to overcome the above problems, projects usually introduce NoSQL technology, which is a memory-based database and provides certain persistence functions.

Redis technology is one of the NoSQL technologies, but the introduction of redis may cause problems such as cache penetration, cache breakdown, and cache avalanche. This paper analyzes these three issues in depth.

Second, the initial understanding

  • Cache penetration : The data corresponding to the key does not exist in the data source. Every time a request for this key cannot be obtained from the cache, the request will go to the data source, which may overwhelm the data source. For example, using a non-existent user id to obtain user information, neither the cache nor the database, if hackers exploit this vulnerability to attack, the database may be overwhelmed.
  • Cache breakdown : The data corresponding to the key exists, but it expires in redis. If there are a large number of concurrent requests at this time, these requests will generally load data from the backend DB and set it back to the cache when the cache expires. At this time, there are large concurrent requests It may instantly overwhelm the backend DB.
  • Cache avalanche : When the cache server restarts or a large number of caches fail in a certain period of time, it will also put a lot of pressure on the back-end system (such as DB) when it fails.

3. Cache Penetration Solution

A data that must not exist in the cache and cannot be queried, because the cache is passively written when it misses, and for fault tolerance, if the data cannot be found from the storage layer, it will not be written to the cache, which will result in this non-existent data Every request has to go to the storage layer to query, which loses the meaning of caching.

There are many ways to effectively solve the cache penetration problem. The most common one is to use a Bloom filter to hash all possible data into a large enough bitmap. Data that must not exist will be blocked by this bitmap. Intercept, thus avoiding the query pressure on the underlying storage system. In addition, there is also a simpler and more crude method (this is what we use). If the data returned by a query is empty (whether the data does not exist or the system fails), we still cache the empty result, but its The expiration time will be short, no more than five minutes.

Pseudo-code in rough way:

//伪代码
public object GetProductListNew() {
    
    
    int cacheTime = 30;
    String cacheKey = "product_list";

    String cacheValue = CacheHelper.Get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    }

    cacheValue = CacheHelper.Get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    } else {
    
    
        //数据库查询不到,为空
        cacheValue = GetProductListFromDB();
        if (cacheValue == null) {
    
    
            //如果发现为空,设置个默认值,也缓存起来
            cacheValue = string.Empty;
        }
        CacheHelper.Add(cacheKey, cacheValue, cacheTime);
        return cacheValue;
    }
}

4. Cache breakdown solution

The key may be accessed at a certain point of time with high concurrency, which is a very "hot" data. At this time, a problem needs to be considered: the problem of cache being "broken down".

  1. Set hotspot data to never expire.

  2. Interface current limiting and fusing, degraded. Important interfaces must have a current limiting strategy to prevent users from maliciously swiping the interface. At the same time, preparations must be made for downgrades. When some services in the interface are unavailable, a fuse will be implemented and the failure will return quickly.

  3. Bloom filter. Bloomfilter is similar to a hash set, which is used to quickly determine whether an element exists in the set. Its typical application scenario is to quickly determine whether a key exists in a container, and return directly if it does not exist. The key to the Bloom filter lies in the hash algorithm and the size of the container.

  4. Using a mutex (mutex key)

A common practice in the industry is to use mutex. To put it simply, when the cache fails (judging that the value taken out is empty), instead of immediately loading the db, first use some operations of the cache tool with a successful operation return value (such as Redis's SETNX or Memcache ADD) to set a mutex key, when the operation returns successfully, then perform the load db operation and reset the cache; otherwise, retry the entire get cache method.

SETNX is the abbreviation of "SET if Not eXists", that is, it is only set when it does not exist, and it can be used to achieve the lock effect.

public String get(key) {
    
    
      String value = redis.get(key);
      if (value == null) {
    
     //代表缓存值过期
          //设置3min的超时,防止del操作失败的时候,下次缓存过期一直不能load db
      		 if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {
    
      //1代表设置成功
           		  value = db.get(key);
                  redis.set(key, value, expire_secs);
                  redis.del(key_mutex);
              } else {
    
      //这个时候代表同时候的其他线程已经load db并回设到缓存了,这时候重试获取缓存值即可
                  sleep(50);
                  get(key);  //重试
              }
          } else {
    
    
              return value;      
          }
 }

memcache code:

if (memcache.get(key) == null) {
    
      
    // 3 min timeout to avoid mutex holder crash  
    if (memcache.add(key_mutex, 3 * 60 * 1000) == true) {
    
      
        value = db.get(key);  
        memcache.set(key, value);  
        memcache.delete(key_mutex);  
    } else {
    
      
        sleep(50);  
        retry();  
    }  
}

5. Cache avalanche solution

The difference from cache breakdown is that here is for many key caches, while the former is a certain key.

The cache is normally obtained from Redis, and the schematic diagram is as follows:
insert image description here
The schematic diagram of the cache invalidation moment is as follows:
insert image description here

The impact of the avalanche effect on the underlying system when the cache is invalid is terrible! Most system designers consider using locks or queues to ensure that there will not be a large number of threads reading and writing to the database at one time, so as to avoid a large number of concurrent requests falling on the underlying storage system when failure occurs. Another simple solution is to disperse the cache expiration time. For example, we can add a random value based on the original expiration time, such as 1-5 minutes random, so that the repetition rate of each cache expiration time will be reduced. It is difficult to trigger collective failure events.

Locking and queuing, the pseudo code is as follows:

//伪代码
public object GetProductListNew() {
    
    
    int cacheTime = 30;
    String cacheKey = "product_list";
    String lockKey = cacheKey;

    String cacheValue = CacheHelper.get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    } else {
    
    
        synchronized(lockKey) {
    
    
            cacheValue = CacheHelper.get(cacheKey);
            if (cacheValue != null) {
    
    
                return cacheValue;
            } else {
    
    
              //这里一般是sql查询数据
                cacheValue = GetProductListFromDB(); 
                CacheHelper.Add(cacheKey, cacheValue, cacheTime);
            }
        }
        return cacheValue;
    }
}

Locking and queuing is only to reduce the pressure on the database, and does not improve system throughput. Assuming that under high concurrency, the key is locked during cache reconstruction, which means that 999 of the past 1000 requests are blocked. It will also cause the user to wait for a timeout. This is a temporary solution, not a permanent solution!

Note: The solution of locking and queuing is to solve the concurrency problem of the distributed environment, and it is possible to solve the problem of distributed locks; threads will also be blocked, and the user experience is very poor! Therefore, it is rarely used in real high concurrency scenarios!

Random value pseudocode:

//伪代码
public object GetProductListNew() {
    
    
    int cacheTime = 30;
    String cacheKey = "product_list";
    //缓存标记
    String cacheSign = cacheKey + "_sign";

    String sign = CacheHelper.Get(cacheSign);
    //获取缓存值
    String cacheValue = CacheHelper.Get(cacheKey);
    if (sign != null) {
    
    
        return cacheValue; //未过期,直接返回
    } else {
    
    
        CacheHelper.Add(cacheSign, "1", cacheTime);
        ThreadPool.QueueUserWorkItem((arg) -> {
    
    
      //这里一般是 sql查询数据
            cacheValue = GetProductListFromDB(); 
          //日期设缓存时间的2倍,用于脏读
          CacheHelper.Add(cacheKey, cacheValue, cacheTime * 2);                 
        });
        return cacheValue;
    }
} 

explain:

  • Cache mark: record whether the cached data is expired, if it expires, it will trigger to notify another thread to update the actual key cache in the background;
  • Cache data: Its expiration time is twice as long as the cache tag time, for example: the tag cache time is 30 minutes, and the data cache is set to 60 minutes. In this way, when the cache tag key expires, the actual cache can still return the old data to the caller, and the new cache will not be returned until another thread completes the update in the background.

Regarding the solution to cache collapse, three solutions are proposed here: using locks or queues, setting expiration flags to update the cache, setting different cache expiration times for keys, and a solution called "secondary cache".

Guess you like

Origin blog.csdn.net/qq_45473439/article/details/123617310
Recommended