How to solve Redis cache penetration, cache breakdown, and cache avalanche in Spring Boot?

Redis cache penetration, cache breakdown, cache avalanche

I. Overview

① Cache penetration : a large number of requests for keys that do not exist at all

② Cache avalanche : a large number of keys in Redis collectively expire

③ Cache breakdown : a hot key in Redis expires

The root cause of the three: the hit rate of Redis drops, and the request is directly sent to the DB. Under normal circumstances, a large number of resource requests will be responded to by Redis, and only a small number of requests that cannot be responded to by Redis will request the DB. The pressure is very small and it can work normally (as shown in the picture below)

image.png

If a large number of requests are not responded to on redis, these requests will directly access the DB, causing the pressure on the DB to instantly increase and cause it to freeze or crash. As shown below:

① A large number of highly concurrent requests are sent to redis

② These requests find that there are no resources that need to be requested on redis, and the hit rate of redis decreases

③ Therefore, these large numbers of high concurrent requests turn to DB (database server) to request corresponding resources

④ The pressure on the DB increases instantly, directly destroying the DB, and then triggering a series of "disasters"

image.png

So why does redis have no data that needs to be accessed? Through the analysis, it can be roughly summarized into three situations, which correspond to the avalanche, penetration and breakdown of redis (the detailed explanation will start below)

image.png

2. Scenario analysis (detailed explanation)

(1) Cache breakdown

concept:

The reason for the cache avalanche: a hot key in redis expires, but at this time there are a large number of users accessing the expired key

image.png

scene:

The reasons for cache breakdown are usually as follows:

  1. The required hot data does not exist in the cache: When a certain hot data in the system needs to be accessed frequently, if the hot data is not cached at the beginning, it will cause the system to query the database directly every time it requests, resulting in a burden on the database .

  2. Expiration of cached hotspot data: When a hotspot data expires and needs to be re-cached, if there are a large number of requests at this time, it will cause all requests to directly query the database.

    类似于“某男明星塌房事件”上了热搜,这时候大量的“粉丝”都在访问该热点事件,但是可能由于某种原因,redis的这个热点key过期了,那么这时候大量高并发对于该key的请求就得不到redis的响应,那么就会将请求直接打在DB服务器上,导致整个DB瘫痪。

solution:

1. The setting will never expire (set the hotspot data in advance)

Similar to news, a certain blog and other software need to pre-set the hotspot data in redis

2. Lock queue

(Method 1) Double-check the lock:

Only one request A can obtain the mutex, other requests are queued outside, then thread A goes to the DB to query the data and returns it to Redis, and then all requests can get responses from Redis (these requests have two situations: one, After the request that has entered the queue obtains the lock, the data can be obtained in the second query redis; second, the request that has not entered the queue [that is, the request that has not entered the queue for the lock through if (obj == null)], Get the data directly in the external query redis)

image.png

(Method 2) Distributed lock:
Disadvantages:

In the case of high concurrency, performance will be affected. But in most cases, the access can obtain the cached data from the outer layer, and only occasionally because the key expires suddenly, the request at that time will enter the lock mechanism, and enter the queue, there are also Double check to reduce the pressure on the database.

3. Monitor data and adjust timely

Monitor which data is popular data, and adjust the key expiration time in real time using the lock mechanism

(2) Cache Avalanche

concept:

The reason for the cache avalanche: a large number of keys in redis collectively expire

image.png

Example:

  当redis中的大量key集体过期,可以理解为redis中的大部分数据都被清空了(失效了),那么这时候如果有大量并发的请求来到,那么redis就无法进行有效的响应(命中率急剧下降),请求就都打到DB上了,到时DB直接崩溃

scene:

  • A large number of keys collectively expire

    • Solution

      • 1. Locking and queuing + spreading out the invalidation time
      • 2. Use a multi-level cache architecture
      • 3. Set the cache flag
  • Redis service down

    • Solution: redis high availability (cluster, sentinel mode)
  • Computer room power off

    • Solution: prepare disaster preparedness in advance, prepare multiple computer rooms, if one computer room is down, immediately switch to another computer room

Solution:

1. Locking and queuing + spreading out the invalidation time

By using the automatic generation of random numbers to make the expiration time of the key random to prevent collective expiration

image.png

2. Use a multi-level architecture

Use nginx cache + redis cache + other caches, different layers use different (expiration time) caches, and the reliability is stronger

3. Set the cache flag

Record whether the cached data has expired, and if it expires, it will update the actual key.

(1) Instead of starting another thread , a logical expiration time is stored in the value (equivalent to the actual expiration time we set 1 hour, but the logical expiration time may be 50 minutes), when the value is obtained, the actual time is judged > Logical time, lock and update, and the rest of the threads, those who cannot get the lock, return all the old data first.

(2) Asynchronous processing : but judge the actual time > logical time, notify another thread to update

4. Redis high availability (cluster, sentinel mode)

If the Redis service is down, you need to prepare a cluster for Redis in advance, and do a good job in sentinel mode, and if the downtime is found, make up for it.

(3) Cache penetration

concept:

The reason for cache penetration: requesting resources that do not exist at all (DB itself does not exist, and Redis does not exist)

image.png

Example (scenario online): The client sends a large number of unresponsive requests (as shown in the figure below)

image.png

  当大量的客户端发出类似于:http://localhost:8080/user/19833?id=-3872 的请求,就可能导致出现缓存穿透的情况。因为数据库DB中本身就没有id=-3872的用户的数据,所以Redis也没有对应的数据,那么这些请求在redis就得不到响应,就会直接打在DB上,导致DB压力过大而卡死情景在线或宕机。   缓存穿透很有可能是黑客攻击所为,黑客通过发送大量的高并发的无法响应的请求给服务器,由于请求的资源根本就不存在,DB就很容易被打垮了。

Solution:

1. Cache empty objects (+ lock queuing + spread out the invalidation time)

image.png

  • Similar to the above example, although there is no user data with id=-3872 in the database, it is cached in redis (key=-3872, value=null), so that when the request reaches redis, it will directly return a The value of null is given to the client, avoiding a large amount of inaccessible data directly on the DB

    • Notice:

      • When using a null value as a cache, the expiration time set by the key should not be too long to prevent taking up too many redis resources (such as a large number of malicious attacks)
      • The currently accessed data may not be in the database at the time, but it may be available later, so the expiration time cannot be set too long, and a random short time is recommended

2. Bloom filter

  • Blacklist: Store the data that does not exist in the request into the blacklist. Before accessing the data next time, judge whether the key exists in the Bloom filter, and deny access if it exists.
  • Whitelist: Store the existing data in the database into the Bloom filter, and release the subsequent access data only if the requested access is judged to be in the Bloom filter, and reject the subsequent access if it does not exist.

Notice:

  1. Do a good job of data synchronization, because not all data is always on the blacklist or whitelist, and additions, deletions, and modifications will cause changes. So the disadvantage of this method is to do data synchronization.
  2. The Bloom filter has certain errors, so it generally needs to cooperate with some interface traffic restrictions (regulating the frequency of user access within a period of time), permission verification, blacklist, etc. to solve the problem of cache penetration

3. Real-time monitoring:

Real-time monitoring of redis, when the hit rate in redis is found to be reduced, the reason is checked, and the operation and maintenance personnel are cooperated with the analysis and query of the access object and access data, so as to set the blacklist to limit the service (reject hacker attack)

4. Interface verification

Similar to the interception of user permissions, the invalid access of id=-3872 is directly intercepted, and these requests are not allowed to reach Redis and DB.

Guess you like

Origin blog.csdn.net/QRLYLETITBE/article/details/131731587