Redis cache penetration, cache breakdown, cache avalanche causes + solutions

I. Introduction

In our daily development, we all use databases to store data. Since there is usually no high concurrency in general system tasks, it seems that there is no problem, but once it involves the demand for large amounts of data For example, when some commodities are snapped up, or when the amount of visits to the homepage is instantaneously large, a system that only uses a database to store data will have serious performance disadvantages due to disk-oriented and slow disk read/write speeds. Thousands of requests come, requiring the system to complete thousands of read/write operations in a very short time. This is often not the database can withstand, and it is extremely easy to paralyze the database system and eventually cause service downtime. Serious production problems.

In order to overcome the above-mentioned problems, projects usually introduce NoSQL technology, which is a memory-based database and provides certain persistence functions.

Redis technology is one of NoSQL technologies, but the introduction of redis may cause cache penetration, cache breakdown, and cache avalanches. This article analyzes these three issues in depth.

Second, first understanding

● Cache penetration: The data corresponding to the key does not exist in the data source. Every time a request for this key cannot be obtained from the cache, the request will go to the data source, which may overwhelm the data source. For example, using a non-existent user id to obtain user information, no matter whether there is a cache or a database, if a hacker exploits this vulnerability to attack, it may overwhelm the database.

● Cache breakdown: The data corresponding to the key exists but expires in redis. At this time, if there are a large number of concurrent requests coming, these requests will generally load data from the backend DB and reset it to the cache when the cache expires. At this time, it is very concurrent. The request may overwhelm the back-end DB instantly.

● Cache avalanche: When the cache server restarts or a large number of caches fail in a certain period of time, it will also put a lot of pressure on the back-end system (such as DB) when it fails.

3. Cache penetration solution

A data that must not exist in the cache and cannot be queried. Because the cache is passively written when it is missed, and for fault tolerance, if the data cannot be found from the storage layer, it will not be written to the cache, which will result in this non-existent data Every request must go to the storage layer to query, losing the meaning of caching.

There are many ways to effectively solve the cache penetration problem. The most common one is to use a Bloom filter to hash all possible data into a large enough bitmap. A data that must not exist will be affected by this bitmap. Block it, thereby avoiding query pressure on the underlying storage system.
1. Bloom Filter
Redisson uses Redis to implement a Java distributed Bloom Filter (Bloom Filter). The maximum number of bits contained is 2^32.

RBloomFilter<SomeObject> bloomFilter = redissonClient.getBloomFilter("sample");
// 初始化布隆过滤器,预计统计元素数量为55000000,期望误差率为0.03
bloomFilter.tryInit(55000000L, 0.03);
bloomFilter.add(new SomeObject("field1Value", "field2Value"));
bloomFilter.add(new SomeObject("field5Value", "field8Value"));
bloomFilter.contains(new SomeObject("field1Value", "field8Value"));

Bloom filter data sharding (Sharding)
Redisson cluster distributed bloom filter based on Redis provides the function of Bloom filter data sharding for the Redis environment in the cluster state through the RClusteredBloomFilter interface. Through the optimized algorithm, the unused bits are compressed to release the cluster memory space. The state of each object will be distributed throughout the cluster. The maximum number of bits contained is 2^64.

RClusteredBloomFilter<SomeObject> bloomFilter = redissonClient.getClusteredBloomFilter("sample");
// 采用以下参数创建布隆过滤器
// expectedInsertions = 255000000
// falseProbability = 0.03
bloomFilter.tryInit(255000000L, 0.03);
bloomFilter.add(new SomeObject("field1Value", "field2Value"));
bloomFilter.add(new SomeObject("field5Value", "field8Value"));
bloomFilter.contains(new SomeObject("field1Value", "field8Value"));

2. The rude method
There is also a simpler and rude method (this is what we use). If the data returned by a query is empty (whether the data does not exist or the system is faulty), we still cache the empty result , But its expiration time will be very short, no more than five minutes.

Crude way pseudo code:

//伪代码
public Object getProductListNew() {
    
    
    int cacheTime = 30;
    String cacheKey = "product_list";

    String cacheValue = redisTemplate.opsForValue().get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    } else {
    
    
        //数据库查询不到,为空
        cacheValue = baseDao.getProductList();
        if (cacheValue == null) {
    
    
            //如果发现为空,设置个默认值,也缓存起来
            cacheValue = string.Empty;
        }
        redisTemplate.opsForValue().set(cacheKey, cacheValue, cacheTime, TimeUnit.SECONDS);
        return cacheValue;
    }
}

Fourth, the buffer breakdown solution

Keys may be accessed with super-concurrent access at certain points in time, which is a very "hot" data. At this time, there is a problem that needs to be considered: the problem of "breakdown" of the cache. A common practice in the world of using mutex
keys
is to use mutex. Simply put, when the cache is invalid (the judged value is empty), instead of going to load db immediately, first use certain operations with the return value of the successful operation of the caching tool (such as Redis SETNX or Memcache) ADD) to set a mutex key, when the operation returns successfully, perform the load db operation and reset the cache; otherwise, retry the entire get cache method.

SETNX is the abbreviation of "SET if Not eXists", that is, it is set only when it does not exist, and it can be used to achieve the lock effect.

public String get(key) {
    
    
      String value = redis.get(key);
      if (value == null) {
    
     //代表缓存值过期
          //设置3min的超时,防止del操作失败的时候,下次缓存过期一直不能load db
      if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {
    
      //代表设置成功
               value = db.get(key);
                      redis.set(key, value, expire_secs);
                      redis.del(key_mutex);
              } else {
    
      //这个时候代表同时候的其他线程已经load db并回设到缓存了,这时候重试获取缓存值即可
                      sleep(50);
                      get(key);  //重试
              }
          } else {
    
    
              return value;      
          }
 }

memcache code:

if (memcache.get(key) == null) {
    
      
    // 3 min timeout to avoid mutex holder crash  
    if (memcache.add(key_mutex, 3 * 60 * 1000) == true) {
    
      
        value = db.get(key);  
        memcache.set(key, value);  
        memcache.delete(key_mutex);  
    } else {
    
      
        sleep(50);  
        retry();  
    }  
}

Other plans: to be added.

Five, cache avalanche solution

The difference from cache breakdown is that there are many key caches here, and the former is a certain key.
The cache is normally obtained from Redis. The schematic diagram is as follows: the
Insert picture description here
schematic diagram of the moment of
Insert picture description here
cache failure is as follows: the impact of the avalanche effect on the underlying system when the cache fails is terrible! Most system designers consider using locks or queues to ensure that there will not be a large number of threads reading and writing to the database at one time, so as to avoid a large number of concurrent requests falling to the underlying storage system in the event of failure. There is also a simple solution to spread the cache expiration time from time to time. For example, we can add a random value on the basis of the original expiration time, such as 1-5 minutes randomly, so that the repetition rate of the expiration time of each cache will be reduced. It is difficult to cause a collective failure event.
The pseudo code is as follows:

//伪代码
public Object getProductListNew() {
    
    
    int cacheTime = 30;
    String cacheKey = "product_list";
    String lockKey = cacheKey;

    String cacheValue = redisTemplate.opsForValue().get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    } else {
    
    
        synchronized(lockKey) {
    
    
            cacheValue = redisTemplate.opsForValue().get(cacheKey);
            if (cacheValue != null) {
    
    
                return cacheValue;
            } else {
    
    
              	//这里一般是sql查询数据
                cacheValue = baseDao.getProductList();
                redisTemplate.opsForValue().set(cacheKey, cacheValue, cacheTime, TimeUnit.SECONDS);
            }
        }
        return cacheValue;
    }
}

The lock queuing is only to reduce the pressure on the database, and does not improve the system throughput. Assuming that under high concurrency, the key is locked during the cache reconstruction, which is blocking 999 of 1,000 requests. It will also cause the user to wait for a timeout, which is a temporary solution, not a permanent cure!

Note: The solution of locking and queuing is the concurrency problem of the distributed environment, and it may also solve the problem of distributed locking; the thread will be blocked, and the user experience is very poor! Therefore, it is rarely used in real high concurrency scenarios!

Random value pseudo code:

//伪代码
public Object getProductListNew() {
    
    
    int cacheTime = 30;
    String cacheKey = "product_list";
    //缓存标记
    String cacheSign = cacheKey + "_sign";

    String sign = redisTemplate.opsForValue().get(cacheSign);
    //获取缓存值
    String cacheValue = redisTemplate.opsForValue().get(cacheKey);
    if (sign != null) {
    
    
        return cacheValue; //未过期,直接返回
    } else {
    
    
    	redisTemplate.opsForValue().set(cacheSign, "1", cacheTime, TimeUnit.SECONDS);
        new Thread(() -> {
    
    
      		//这里一般是 sql查询数据
            cacheValue = baseDao.getProductList();
            //日期设缓存时间的2倍,用于脏读
            redisTemplate.opsForValue().set(cacheKey, cacheValue, cacheTime * 2, TimeUnit.SECONDS);             
        });
        return cacheValue;
    }
} 

Explanation:
● Cache mark: Record whether the cached data is expired, if it expires, it will trigger another thread to update the actual key cache in the background;

● Cached data: Its expiration time is 1 times longer than the time of the cache mark. For example, the mark cache time is 30 minutes, and the data cache is set to 60 minutes. In this way, when the cache tag key expires, the actual cache can return the old data to the caller, and will not return to the new cache until another thread is updated in the background.

Regarding the solution to cache crashes, three solutions are proposed here: using locks or queues, setting expiration flags to update the cache, setting different cache invalidation times for keys, and a solution called "second-level cache".

Six, summary

For business systems, it is always a case-by-case analysis, there is no best, only the most suitable.

For other problems with the cache, full cache and data loss, everyone can learn by themselves. Finally, I also mention the three words LRU, RDB, and AOF. Usually we use the LRU strategy to deal with overflow, Redis's RDB and AOF persistence strategy to ensure data security under certain circumstances.

Guess you like

Origin blog.csdn.net/a251628111/article/details/107284886