[Redis] Talk about cache penetration, cache breakdown, and cache avalanche

Preface

In our daily development, we use the database to store data. Most of the system tasks usually do not have high concurrency scenarios, so it seems that there is no problem. When it comes to the needs of large amounts of data, such as shopping malls In the case of rushing merchandise, or when the homepage has a large number of visits, a system that only uses a database to store data will have serious performance drawbacks due to the disk-oriented and slow disk read and write speed. Thousands of requests arrive in an instant, requiring the system to complete thousands of read and write operations in a very short period of time. This can easily lead to serious problems such as database system paralysis and service downtime. At this time, Redis solves this problem very well. The cache is just a layer of protection added to relieve the pressure on the database. When the data needed outside can not be queried from the cache, the database must be queried, but it will follow. Another problem is cache penetration, cache breakdown, and cache avalanche.

1. Cache penetration

The data corresponding to a certain key does not exist in the database. Every time a request for this key cannot be obtained from the cache, the request will go to the database, which may overwhelm the database. For example, using a non-existent user id to obtain user information, whether there is no cache or database, if hackers use this vulnerability to attack, it may overwhelm the database.

There are two solutions:

Solution 1: For data that does not exist in the database, if the data returned by a query is empty (whether the data does not exist or the system is faulty), we still cache the empty result, and set the default value Null in the cache , To avoid occupying resources, the expiration time of setting it will be very short, no more than five minutes. This is relatively simple, but it is also easy to crack. For example, an attacker does not repeatedly request data that does not exist in the database by analyzing the data format. In this way, solution 1 is equivalent to invalid.

public object getProductListNew(){
    
    
	//设置失效时间
	int cacheTime = 30;
    String cacheKey = "product_list";

    String cacheValue = CacheHelper.Get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    }
    
 	cacheValue = CacheHelper.Get(cacheKey);
    if (cacheValue != null) {
    
    
        return cacheValue;
    } else {
    
    
        //数据库查询不到,为空
        cacheValue = GetProductListFromDB();
        if (cacheValue == null) {
    
    
            //如果发现为空,设置个默认值,也缓存起来
            cacheValue = string.Empty;
        }
        CacheHelper.Add(cacheKey, cacheValue, cacheTime);
        return cacheValue;
    }
}

Option 2: You can set some filtering rules, the most common is to use Bloom filters. Its design idea is to filter the data before the database query. If the data is found to be non-existent, then no database query will be performed to reduce the access pressure of the database.

Bloom filter is a kind of probabilistic data structure, characterized by efficient insertion and query, it will tell you "something must not exist or may exist", compared to traditional List, Set, Map and other data structures, The Long filter is a bit array, which is more efficient and takes up less space. The disadvantage is that the returned result is probabilistic rather than exact. Hash all possible data into a sufficiently large bitmap, and a data that must not exist will be intercepted by the bitmap, thus avoiding the query pressure on the underlying storage system.

Insert picture description here
If we want to map a value to the Bloom filter, we need to use multiple different hash functions to generate multiple hash values, and each generated hash value points to the bit position 1. For example, for the value "zhangsan ”And three different hash functions to generate hash values ​​1, 4, 7

Insert picture description here
We now save another value "lisi". If the hash function returns 4, 5, and 8, the graph continues to become:

Insert picture description here
When we want to judge whether the Bloom filter records a certain data, the Bloom filter will perform the same hash processing on the data first, for example, the hash function of "wangwu" returns 2, 5, and 8. As a result, we find that the value of bit 2 is 0, indicating that no value is mapped to this bit. Therefore, we can say with certainty that the data "wangwu" does not exist.

But at the same time we will find that the 4 bit is overwritten because the hash functions of "zhangsan" and "lisi" both return this bit. Then as the data saved by the Bloom filter continues to increase, the probability of repetition will continue to increase, so when we filter a certain data, if we find that all three hash values ​​are recorded in the filter, then It can only show that the data may be contained in the filter, but it is not absolutely certain, because the hash value of other data may have an impact on the result. This also explains that the Bloom filter mentioned above can only show " Something must not exist or may exist". As for why three different hash functions are used to obtain values, because as long as one of the three hash values ​​does not exist, the data must not be in the filter. This can be reduced The error probability due to hash collision (the hash value of two data is the same).

2. Cache breakdown

A certain key may be accessed extremely concurrently at certain points in time, which is a very "hot" data. At this time, there is a problem that needs to be considered: the problem of "breakdown" of the cache.

Use mutex key

A common practice in the industry is to use mutex. Simply put, when the cache is invalid (the judged value is empty), instead of going to load db immediately, first use certain operations with the return value of the successful operation of the caching tool (such as Redis SETNX or Memcache) ADD) to set a mutex key. When the operation returns successfully, perform the load db operation and reset the cache; otherwise, retry the entire get cache method.

SETNX is the abbreviation of "SET if Not eXists", that is, it is set only when it does not exist, and it can be used to achieve the lock effect.

public String get(key) {
    
    
      String value = redis.get(key);
      if (value == null) {
    
     //代表缓存值过期
          //设置3min的超时,防止del操作失败的时候,下次缓存过期一直不能load db
      if (redis.setnx(key_mutex, 1, 3 * 60) == 1) {
    
      //代表设置成功
               value = db.get(key);
                      redis.set(key, value, expire_secs);
                      redis.del(key_mutex);
              } else {
    
      //这个时候代表同时候的其他线程已经load db并回设到缓存了,这时候重试获取缓存值即可
                      sleep(50);
                      get(key);  //重试
              }
          } else {
    
    
              return value;      
          }
 }```

**. 缓存雪崩**

Guess you like

Origin blog.csdn.net/weixin_42777004/article/details/108712476