About cache breakdown, cache penetration, cache avalanche and solutions

Cached processing flow in the application

For the foreground request, the background fetches the data from the cache first, and returns the result directly, and then fetches it from the database when it is not available. The database fetches the update cache and returns the result. If the database does not fetch the result, it will directly return an empty result.

Insert picture description here
Seemingly simple processing flow; however, in a distributed application, as long as the cache is introduced, in order to ensure the robustness of the system, the cache system design must consider the problems of cache breakdown, cache penetration, and cache avalanche And solutions

Cache breakdown

Cache breakdown:There is no data in the cache, but there is data in the database。(Generally, the expiration of the cache time causes the cache to become invalid) At this time, because there are so many concurrent users, the data is not read in the read cache, and the data is retrieved from the database at the same time, causing the database pressure to increase instantaneously, causing excessive pressure, or even overwhelming the database

Insert picture description here

solution

  • Set hotspot data not to expire: Directly set the cache to not expire, and then the timing task will asynchronously load the data and update the cache. This method is suitable for more extreme scenarios, such as scenarios with particularly high traffic. When using, you need to consider the time when the business can accept data inconsistencies, and the handling of abnormal situations. Don't refresh the cache at that time, and it will always be dirty. , Then it's cold
  • Add mutex: Among multiple concurrent requests, only the first request (or the first few requests) thread can obtain the lock and perform database query operations. Other threads will block and wait if they cannot obtain the lock. Wait until the first thread will After the data is written to the cache, subsequent requests directly fetch the data from the cache. For the mutex choice, it can be Java APIlevel Lockor JVMthe synchronizedor distributed lock, as long as the request to the database can greatly reduce.Note dimensions Yaoan locked keydimension to lock

Pseudo code for using distributed locks, for reference only

public Object getData(String key) throws InterruptedException {
    
    

    Object value = redis.get(key);
    // 缓存值过期
    if (value == null) {
    
    
        // lockRedis:专门用于加锁的redis;
        // "empty":加锁的值随便设置都可以
        if (lockRedis.set(key, "empty", "PX", lockExpire, "NX")) {
    
    
            try {
    
    
                // 查询数据库,并写到缓存,让其他线程可以直接走缓存
                value = getDataFromDb(key);
                redis.set(key, value, "PX", expire);
            } catch (Exception e) {
    
    
                // 异常处理
            } finally {
    
    
                // 释放锁
                lockRedis.delete(key);
            }
        } else {
    
    
            // sleep50ms后,进行重试
            Thread.sleep(50);
            return getData(key);
        }
    }
    return value;
}

Cache penetration

Cache penetration:Query a data and find that it is not in the cache; then query the database and find that there is no. Later, users kept making requests, but the cache was not hit, so they all went to request the database. This puts a lot of pressure on the database, causing the database pressure to increase instantaneously, causing excessive pressure, and may even crush the database

Insert picture description here

solution

  • Interface verification: There may be a small amount in the ordinary course of business to access a nonexistent keysituation, but in general a lot of situation does not occur, so this scenario is the most likely suffered unlawful attacks. You can do a layer of verification at the outermost layer: user authentication, data legality verification, etc., such as user authentication verification, idbasic verification, id<=0and direct interception
  • Cache null: Not taken from the cache data, in the database does not get to this time may be key-valuea write to key-nullthe cache valid time point can be set shorter, as 30seconds (set too long leads to normally would not be able to use). This prevents the user repeatedly attacked by the same idviolent attack
  • Bloom filter: Using a Bloom filter all possible access memory key, do not exist keythey are filtered, there keyis further query cache and database

Bloom filter

The characteristics of bloom filters areIf it is judged that it does not exist, it must not exist; if it is judged to exist, there is a high probability, but there is a small probability that it does not exist.. And this probability is controllable, we can make this probability smaller or higher, depending on the needs of the user

Consists of a Bloom filter bitSetand a set of Hashfunctions (algorithms), with a high space efficiency it is Probabilistic algorithms and data structures,Mainly used to determine whether an element exists in the collection

At initialization, bitSeteach bit is initialized 0, and it will define the Hashfunctions, for example, 3group Hashfunctions:hash1、hash2、hash3

Writing process

When we want to write a value, as follows, in order jionghui, for example

  • First, jionghuiwith the 3groups Hashwere calculated functions, resulting bitSetin marked:1、7、10
  • Will bitSetthis 3one index mark1

Suppose we have two other values: javaand diaosi, according to the above process with the 3set of Hashfunctions are calculated, the following results

  • java: HashFunction calculates bitSetthe subscript:1、7、11
  • diaosi: HashFunction calculates bitSetthe subscript:4、10、11

Insert picture description here

Query process

When we want to query a value, as follows, likewise jionghui, for example

  • First, jionghuiwith the 3groups Hashwere calculated functions, resulting bitSetin marked:1、7、10
  • Check bitSetthis 3if a subscript are 1, if 3a label does not have to 1, it indicates that the value does not necessarily exist, if 3a subscript are 1, it only shows that there may be, does not mean that there must be

In fact, the example above has explained the problem, and only when we value jionghuiand diaosiwhen the bitSetsubscript 1are: 1、4、7、10、11. When we joined value java, the bitSetindex is 1still this 5one, so when the bitSetindex is 1as follows: 1、4、7、10、11when we can not determine the value of the javadeposit does not exist

The fundamental reason is that different values with Hashthe calculation function, you may get the same index, so a value of tag bits may be other values to the standard. This is why the Bloom filter can only determine that a certain value may exist, but cannot determine the inevitable existence. But conversely, if the value is based on Hashmark-bit computing function is not all 1, then it indicates there is no necessity, this is for sure

Cache avalanche

Cache avalanche:When a large-scale cache failure occurs at a certain moment, Such as your caching service is down, there will be a lot of requests come in direct hit DBon, this may lead to the collapse of the entire system, called avalanche

Insert picture description here

solution

  • Expiration time is broken up: Since a large number of caches are invalidated centrally, the easiest to think is to let them not take effect centrally. A random time value may be added to the cache when the expiration time, such that each keyexpiration time distribution to open, does not concentrate at the same time fail
  • Hotspot data does not expire: This method is the same as cache breakdown, but also focuses on considering the refresh interval and how to deal with data exceptions

Reference: https://blog.csdn.net/v123411739/article/details/115058811

Guess you like

Origin blog.csdn.net/weixin_38192427/article/details/115332207