Redis - Breakdown, penetration, avalanche causes and solutions

1 Introduction

We all know that one of the bottlenecks of the computer is IO. In order to solve the problem of the mismatch between the speed of the memory and the disk, a cache is generated, and some hot data are stored in the memory, which can be retrieved as needed, and the request link to the database is reduced to avoid The database hangs. It should be noted that both the breakdown and the penetration and avalanche mentioned later are all under the premise of high concurrency, such as when a certain hot key in the cache fails.

picture

2. The cause of the problem 

There are two main reasons:

1. Key expires;

2. Key is eliminated by page replacement.

The first reason is because in Redis, the key has an expiration time. If the key fails at a certain moment (if the mall is active and starts at zero), then the query requests for a certain product after zero will all be pushed to the database, resulting in the database collapse.

For the second reason, because the memory is limited, new data should be cached and old data should be eliminated all the time. Therefore, in a certain page replacement strategy (a common page replacement algorithm diagram), the data is eliminated. If some products do No one cares about it before the event, and it is bound to be eliminated.

3. Handling ideas for dealing with breakdown

The normal processing request is shown in the figure:

picture

Since key expiration is inevitable, when high traffic comes to Redis, according to the single-threaded feature of Redis, it can be considered that the tasks are executed in sequence in the queue. When the request reaches Redis and finds that the key expires, an operation is performed: set a lock .

The process is roughly as follows:

  • When the request arrives at Redis, it is found that the Redis Key has expired. Check whether there is a lock. If there is no lock, go back to the queue and queue up.

  • Set the lock, note that this should be setnx(), not set(), because other threads may have set the lock

  • Get the lock, go to the database to get the data after getting the lock, and release the lock after the request returns.

picture

 But it leads to a new problem, what if you get the lock to get the data request and then hang up? That is, the lock is not released, and other processes are waiting for the lock. The solution is:

Set an expiration time for the lock. If the expiration time is reached, it will be automatically released before it is released. The problem comes again. The lock is hung up, but what if the lock times out? The common idea is that the lock expiration time value is incremented, but it is unreliable, because the first request may time out, and if the subsequent ones also time out, after several consecutive timeouts, the lock expiration time value is bound to be special. Big, there are too many disadvantages to do so.

Another idea is to start another thread for monitoring, and if the thread that retrieves the data does not hang, delay the expiration time of the lock appropriately.

picture

4. Penetration 

The main reason for penetration is that many requests are accessing data that does not exist in the database. For example, a mall that sells books has been asked to query tea products. Since the Redis cache is mainly used to cache hot data, it is not necessary for data that does not exist in the database. If it is not cached, this abnormal traffic will directly reach the database and return "no" query results.

To deal with this kind of request, the solution is to add a layer of filter to the access request, such as bloom filter, enhanced bloom filter, cuckoo filter.

picture

In addition to the Bloom filter, some parameter tests can be added. For example, the database data id is generally incremented. If the parameter id = -10 is requested, Redis will be bypassed to avoid this situation, and operations such as user authenticity test can be performed. .

5. Avalanche 

Avalanche is similar to breakdown, the difference is that breakdown is that a hot key fails at a certain moment, while avalanche is a large number of hot keys that fail in an instant. Many blogs on the Internet are emphasizing that the strategy for solving avalanches is random expiration time, which is very Inaccurate. For example, when the bank is doing activities, the interest coefficient was 2% before, and the coefficient after zero is changed to 3%. In this case, can the user's corresponding key be changed to expire randomly? If the past data is used, it is called dirty data.

Obviously not, save the same money, you save 3 million interest at the end of the year, and the next door is only 2 million, this is not a fight, just kidding~

The correct idea is to first check whether the key expiration is related to the timing. If the timing is irrelevant, it can be solved by random expiration time. In addition, pay attention to the official account "End Code Life", reply to the keyword "information", and get video tutorials and the latest interview materials!

If it is related to the timing, for example, the bank changes a certain coefficient on a certain day as just mentioned, then the strong dependency breakdown scheme should be used. The strategy is to update all keys in the past thread first.

picture

While updating the hot key in the background, the business layer delays the incoming request, such as sleeping for a few milliseconds or seconds, to disperse the pressure on the subsequent hot key update.

Guess you like

Origin blog.csdn.net/qq_34272760/article/details/121259508