Common cache penetration, cache breakdown, cache avalanche, and cache preheating solutions in Redis

1. Cache penetration

1. What is cache penetration?

Cache penetration means that the data requested by the user does not exist in the cache and does not exist in the database, causing the user to query the database every time he requests the data. If a malicious attacker continues to request data that does not exist in the system, a large number of requests will fall on the database in a short period of time, causing excessive pressure on the database, or even causing the database to be unable to withstand it and crash.

The key to cache penetration is that the specified key value cannot be found in Redis. The fundamental difference from cache penetration is that the incoming key does not exist in Redis. If a hacker passes in a large number of non-existent keys, a large number of requests will directly hit the database, which is a very dangerous situation. Therefore, in daily development, we need to perform good verification of parameters. For some illegal parameters or keys that cannot exist, an error message should be returned directly.

As shown below:
cache penetration

2. Solution

2.1 Invalid keys are stored in Redis

When Redis cannot find the data and the data cannot be found in the database, we can store the invalid key in Redis, set its value to "null", and set a very short expiration time. In this way, when a subsequent request to query this key occurs, null can be returned directly without querying the database. However, there is a problem with this processing method, that is, if the incoming key that does not exist is random every time, then there is no point in storing it in Redis.

2.2 Introducing Bloom filter

You can introduce a Bloom filter to determine whether a key exists before using the cache. The Bloom filter has a certain misjudgment rate, but if the Bloom filter determines that a certain key does not exist, then it can be determined that the key does not exist; and if it determines that a certain key exists, there is a high probability that it does exist ( There is a certain misjudgment rate). Therefore, we can store all keys in the database in Bloom filters, and then query whether the key exists through the Bloom filter before querying Redis. If the Bloom filter determines that the key does not exist, it can be returned directly without accessing the database, thus reducing the query pressure on the underlying storage system. This method can effectively improve system performance and query efficiency.

2.3 How to choose:

  • For some malicious attacks, a large number of keys brought by the attack are random, so if we use the first solution, we will cache a large amount of data without keys. Then this solution is not suitable. We can first use the Bloom filter solution to filter out these keys.
  • Therefore, for data with an unusually large number of keys and a low request repetition rate, the second solution is used to directly filter out the data. For empty data with limited keys and a relatively high repetition rate, the first method can be used first for caching.

2. Cache breakdown

1. What is cache breakdown?

Cache breakdown and cache avalanche are two similar phenomena. Cache avalanche refers to a large-scale cache failure at a certain moment, resulting in a large number of requests directly accessing the database, thus causing a sharp increase in pressure on the database. Cache breakdown refers to the cache failure of a certain hotspot, which causes a large number of concurrent requests to be concentrated on the cache. However, due to the cache failure, these requests cannot obtain data from the cache and can only directly access the database, which in turn leads to severe database pressure. increase.

2. Solution

Hotspot key does not set expiration time

3. Cache avalanche

1. What is cache avalanche?

  • If a large number of keys in the cache are invalid at the same time at a certain moment, it will cause a large number of requests to directly access the database, thus putting huge pressure on the database. In high concurrency situations, this may lead to the risk of database downtime. If the operation and maintenance personnel immediately restart the database, it is likely to attract new request traffic again and continue to put excessive pressure on the database. This situation is called a cache avalanche.
  • The key to causing a cache avalanche is that a large number of keys expire at the same time.
  • This situation may occur in two situations: first, the Redis cache is down, causing all caches to become invalid; second, the keys in the cache are set to the same expiration time, causing a large number of keys to expire at the same time.

2. Solution

2.1 Uniform expiration

You can adopt a uniform expiration strategy, that is, set different expiration times to make the cache invalidation time as even as possible to avoid the same expiration time causing large-scale caches to expire at the same time, thereby causing a large number of database accesses. A common approach is to add a random value to the expiration time of each key to ensure that the cache does not expire in large areas at the same time.

2.2 Hotspot data cache never expires

In order to prevent cache avalanche caused by cache failure of hotspot data, a common approach is to make the cache of hotspot data never expire. This means that for frequently accessed data, we can set their cache expiration time to permanent so that it will always remain in the cache. In this way, hotspot data will always be available and will not cause a large number of requests to directly access the database due to expiration, thus reducing the pressure on the database. Of course, in order to avoid data inconsistency caused by cached data expiration, we need to ensure that the corresponding cache is updated in time when updating data. In this way, the cache of hotspot data can continue to provide fast response to the system and effectively avoid cache avalanche problems caused by cache invalidation.

2.3 Adopt a current limiting and downgrading strategy

In order to prevent too many requests from putting too much pressure on the database and causing the system to crash, a current limiting and downgrading strategy can be adopted. When the system's traffic reaches a certain threshold, prompt information such as "System Congestion" can be directly returned to limit further requests. In this way, it is guaranteed that at least some users can use the system normally, and for other users, the results can be finally obtained even if multiple refreshes are required.

4. Cache warm-up

1. What is cache warming?

  • Cache preheating means loading relevant cache data into the cache system in advance after the system goes online. To avoid the problem of first querying the database and then caching the data when the user requests it, the user directly queries the cached data that has been preheated in advance.
  • If there is no preheating, the initial state data of Redis will be empty. In the early stage of system launch, high-concurrency traffic will access the database, causing traffic pressure on the database.

2. Solution

2.1 Load the cache when the project starts

2.2 Use scheduled task scripts to refresh the cache

2.3 Load hotspot data into cache in advance

2.4 Summary

  1. When the amount of data is small, the cache loading operation can be performed when the project is started;
  2. When the amount of data is large, a scheduled task script can be used to refresh the cache;
  3. For situations where the amount of data is too large, priority can be given to ensuring that hotspot data is loaded into the cache in advance.

Loading the cache during the startup process can reduce frequent access to the database and improve the system's concurrent processing capabilities.

The use of scheduled task scripts can refresh the cache regularly to ensure timely updating of data.

For situations where the amount of data is too large, loading hotspot data into the cache in advance can avoid frequent database queries, thereby reducing the pressure on the database.

Through reasonable caching strategies and data loading methods, the performance and stability of the system can be optimized.

Guess you like

Origin blog.csdn.net/qq_39939541/article/details/132376732