Detailed explanation of cache penetration, cache avalanche, and cache breakdown

I. Introduction

In actual business scenarios, Redis is generally used in conjunction with other databases to reduce the pressure on the back-end database, such as the relational database MySQL.

Redis will cache the data that is frequently queried in MySQL, such as hot data, so that when users come to visit, they do not need to query in MySQL, but directly obtain the cached data in Redis, thereby reducing the back-end Read pressure on the database. If Redis does not have the data queried by the user, the user's query request will be transferred to the MySQL database at this time. When MySQL returns the data to the client, it will cache the data in Redis at the same time, so that when the user reads it again, it can Fetch data directly from Redis. The flowchart is as follows:

insert image description here

When we use caching, there are usually two goals: first, to improve response efficiency and concurrency; second, to reduce the pressure on the database.

The three scenarios mentioned in this article: cache penetration, cache avalanche, and cache breakdown occur because the cache loses its expected function in some special cases.

When the cache fails or fails to hold back the traffic, the traffic will flow directly into the database. In the case of high concurrency, the database may be directly destroyed, causing the entire system to crash.

This is the main premise we need to know, and cache penetration, cache avalanche, and cache breakdown are just subdivided scenarios of different scenarios under this major premise.

2. Cache penetration

1. Problem description

Cache penetration means that when a user queries a certain data, the data does not exist in Redis, that is, the cache is not hit. At this time, the query request will be transferred to the persistence layer database MySQL. It turns out that the data does not exist in MySQL. MySQL only An empty object can be returned, indicating that the query failed. If there are a lot of requests of this kind, or users use such requests to carry out malicious attacks, it will put a lot of pressure on the MySQL database, and even crash. This phenomenon is called cache penetration.

Note: Although the query database is empty, the sql statement has been executed. As long as the sql statement is executed, mysql will be supercharged, especially complex queries sometimes consume a lot of cpu, and when the cpu is full, it will crash immediately!

There are generally two types of scenarios where cache penetration occurs:

  • The original data exists, but due to some reasons (accidental deletion, active cleaning, etc.), it is deleted at the cache and database levels, but the front-end or front-end applications still retain the data;
  • Malicious attacks, using non-existent Keys or malicious attempts to generate a large number of non-existent business data requests.

2. Solutions

Generally speaking, there are only two ways for computers to solve problems: space for time, and time for space. Of course, current algorithms focus on exchanging space for time.

  1. Cache empty objects: 当 MySQL 返回空对象时, Redis 将该对象缓存起来,同时为其设置一个过期时间 . When the user initiates the same request again, an empty object will be obtained from the cache, and the user's request is blocked at the cache layer, thereby protecting the back-end database, but there are also some problems in this approach, 虽然请求进不了 MySQL ,但是这种策略会占用 Redis 的缓存空间.
  2. Pre-verification of business logic: data legality verification is performed at the entrance of the business request to check whether the request parameters are reasonable, whether they contain illegal values, whether they are malicious requests, etc., and effectively block illegal requests in advance. For example, when querying based on age, the requested age is -10 years old, which is obviously an illegal request parameter, and it will be judged and returned directly during parameter verification.
  3. User blacklist restriction: When an abnormal situation occurs, the accessed objects and data are monitored in real time, user behavior is analyzed, and specific user restrictions are imposed on intentional requests, crawlers or attackers;
  4. Real-time monitoring: When it is found that the hit rate of Redis begins to drop rapidly, it is necessary to check the access objects and the accessed data, and cooperate with the operation and maintenance personnel to set a blacklist to restrict services
  5. Bloom filter: We know that if the Bloom filter determines that the data does not exist, then the data must not exist. Using this feature can prevent cache penetration.
    First, store the hotspot data that users may access in the Bloom filter (also called cache warm-up). When a user request arrives, it will first pass through the Bloom filter. If the requested data does not exist in the Bloom filter exists, the request will be directly rejected, otherwise the query will continue. Compared with the first method, the Bloom filter method is more efficient and practical. Its flow diagram is as follows:
    insert image description here

Cache preheating: refers to loading relevant data into the Redis cache system in advance when the system starts. This avoids loading data when the user requests it.

3. Cache breakdown

1. Problem description

Cache breakdown means that the data queried by the user does not exist in the cache, but the back-end database exists. This phenomenon is generally caused by the expiration of the key in the cache. For example, a hot data key is receiving a large number of concurrent accesses all the time. If the key suddenly fails at a certain moment, a large number of concurrent requests will enter the back-end database, causing its pressure to increase instantly. This phenomenon is known as cache breakdown.

2. Solutions

  1. Change the expiration time: Set the hotspot data to never expire.
  2. Distributed lock: use the method of distributed lock to redesign the way to use the cache. The process is as follows:
    1. Locking: When we query data by key, we first query the cache. If not, we lock it through distributed locks. The first process to acquire the lock enters the back-end database for query, and buffers the query results to Redis.
    2. Unlocking: When other processes find that the lock is occupied by a certain process, they enter the waiting state until the other processes access the cached key in turn after unlocking.
  3. Real-time adjustment: monitor which data is popular on site, and adjust the key expiration time in real time

4. Cache Avalanche

1. Problem description

Cache avalanche means that a large number of keys in the cache expire at the same time, and at this time the data access volume is very large, which leads to a sudden surge in pressure on the back-end database, or even hangs up. This phenomenon is called cache avalanche. It is different from cache breakdown. Cache breakdown is when a hot key suddenly expires when the amount of concurrency is particularly large, while cache avalanche means that a large number of keys expire at the same time, so they are not of the same magnitude at all.

Therefore, there are usually two scenarios for cache avalanche:

  • A large number of hot keys expire at the same time;
  • Cache service failure;

2. Solutions

Cache avalanche and cache breakdown have similarities, so the solutions are interoperable.

  1. Build a multi-level cache architecture: nginx cache + redis cache + other caches (ehcache, etc.)
  2. Use locks or queues: Use locks or queues to ensure that there will not be a large number of threads reading and writing to the database at one time, so as to avoid a large number of concurrent requests falling on the underlying storage system when failure occurs. Not suitable for high concurrency
  3. Set the expiration flag to update the cache: record whether the cache data is expired (set the advance amount), if it expires, it will trigger to notify another thread to update the actual key cache in the background.
  4. Distribute the cache expiration time: For example, we can add a random value based on the original expiration time, such as 1-5 minutes random, so that the repetition rate of each cache expiration time will be reduced, and it is difficult to cause collective invalidation event.
  5. No expiration time is set: Hot data can be considered not to expire, and the cache is updated asynchronously in the background, which is suitable for scenarios that do not strictly require cache consistency.
  6. Dual-key strategy : The primary key sets an expiration time, and the backup key does not set an expiration time. When the primary key fails, the value of the backup key is returned directly.

V. Summary

Many people confuse cache penetration with breakdown, mainly because of noun confusion. More, I think it is enough to pay attention to the meaning.

The difference between cache penetration and breakdown:

  • Penetration: "no" data in the database;
  • Breakdown: "has" data in the database;

When it comes to penetration, we imagine the Armor-Piercing Star Hammer of Glory of Kings, ignoring defensive equipment, because there is no cache at all, and there is no database, so it directly hits the real damage.

insert image description here

Cache breakdown can be avoided, because only the redis cache data is invalid, but there is data in the database, as long as the data in the database is updated to redis, then the problem of cache breakdown can be solved.

However, cache penetration means that this data is not in the database either. Therefore, it is impossible to store data in the redis cache, so as long as someone comes to query, it must not be found in the cache, so the database must be used. Then, assuming that many people deliberately search for records that are not in the database, our redis will not be able to act as a barrier, because there is no data in redis, so concurrent queries will definitely hit the database. Then, if you want to solve cache penetration, you must find a way to identify which requested data is not available in the database, and then filter the queries of these requests.

The cache avalanche refers more to the avalanche caused by multiple breakdowns.

For different cache exception scenarios, different solutions can be selected for processing. Of course, in addition to the above solutions, we can also take measures at the service layer such as current limiting, downgrading, and fusing. We can also consider whether the database layer can be expanded horizontally. When a cache exception occurs, ensure that the database can withstand the traffic without causing the entire system collapse.

Guess you like

Origin blog.csdn.net/weixin_43888891/article/details/131397044