Redis cache avalanche, cache penetration, cache breakdown

picture

Classification of cache exception scenarios

In the actual production environment, abnormal scenarios such as cache penetration, cache breakdown, and cache avalanche are sometimes encountered. In order to avoid huge losses caused by abnormalities, we need to understand the causes and solutions of each abnormality to help improve system reliability and High availability.

cache penetration

What is cache penetration?

Cache penetration means that the data requested by the user does not exist in the cache, that is, there is no hit, and it does not exist in the database at the same time. As a result, every time the user requests the data, he has to query the database once, and then returns empty.

If a malicious attacker keeps requesting data that does not exist in the system, a large number of requests will fall on the database in a short period of time, resulting in excessive pressure on the database and even crashing the database system.

Common solutions for cache penetration

(1) Bloom filter (recommended)

The Bloom Filter (BF) was proposed by Burton Howard Bloom in 1970 and is a probabilistic data structure with high space efficiency.

Bloom filters are designed to detect whether a specific element exists in a collection.

If we usually want to judge whether an element is in a set, we usually use the method of search and comparison. The following analyzes the search efficiency of different data structures:

  • Using linear table storage, the search time complexity is O(N)
  • Stored in a balanced binary sorting tree (AVL, red-black tree), the search time complexity is O(logN)
  • Using hash table storage, considering hash collision, the overall time complexity is also O[log(n/m)]

When it is necessary to determine whether an element exists in a massive data collection, not only the search time is slow, but also a large amount of storage space is occupied. Let's take a look at how Bloom filters solve this problem.

Bloom filter design idea

The Bloom filter is a data structure composed of a bit array (bit array) with a length of m bits and k hash functions (hash function). The bit array is initialized to 0, and all hash functions can hash the input data as evenly as possible.

When an element is to be inserted into the Bloom filter, the element is calculated by k hash functions to generate k hash values, and the hash value is used as the subscript in the bit array, and all k corresponding bit values ​​are represented by 0 is set to 1.

When an element is to be queried, it is also calculated by a hash function to generate a hash value, and then the corresponding k bit values ​​are checked: if any bit is 0, it indicates that the element must not be in the set; if all bits are 1, indicating that the set is likely to be in the set. Why not necessarily in the collection? Because the hash values ​​calculated by different elements may be the same, there will be a hash collision, resulting in a non-existing element that may correspond to a bit of 1, which is the so-called "false positive". In contrast, "false negatives" (false negatives) will never appear in BF.

To sum up: what the Bloom filter thinks is not in the set must not be in the set; what the Bloom filter thinks is in may or may not be in the set.

For example: The figure below is a Bloom filter with a total of 18 bits and 3 hash functions. The three elements x, y, and z in the set are hashed to different bits by three hash functions, and the bit position is set to 1. When querying the element w, through three hash functions, it is found that there is a bit with a value of 0, and it can be determined that the element is not in the set.

picture

Advantages and disadvantages of Bloom filter

advantage:

  • Save space: no need to store the data itself, only need to store the hash bits corresponding to the data
  • Low time complexity: the time complexity of insertion and search is O(k), k is the number of hash functions

shortcoming:

  • There are false positives: Bloom filter judges existence, and elements may not be in the set; the accuracy of judgment depends on the number of hash functions
  • Unable to remove elements: If an element is removed, but cannot be removed from the Bloom filter, this is also the cause of false positives

Applicable scenarios of Bloom filter

  • Crawler system url deduplication
  • spam filtering
  • blacklist

(2) Return an empty object

When the cache misses, the query persistence layer is also empty, and the returned empty object can be written to the cache, so that the next time the key is requested, the query returns an empty object directly from the cache, and the request will not fall to the persistence layer database. In order to avoid storing too many empty objects, an expiration time is usually set for empty objects.

There are two problems with this approach:

  • If there is a large number of key penetrations, caching empty objects will take up valuable memory space.

  • The key of the empty object has an expiration time set, during which time there may be scenarios where data in the cache and the persistence layer are inconsistent.

cache breakdown

What is cache breakdown?

Cache breakdown means that a key is very hot, and it is constantly carrying large concurrency, and the large concurrency is concentrated on accessing this point. When the key fails at the moment, the continuous large concurrency will break through the cache and directly request the database. , like cutting a hole in a barrier.

Cache Penetration Hazard

The sudden surge in pressure on the database caused a large number of requests to be blocked.

How to solve

Use a mutex (mutex key)

This idea is relatively simple, that is, let one thread write back the cache, and other threads wait for the write-back cache thread to finish executing, and then read the cache again.

picture

At the same time, only one thread reads the database and then writes back to the cache, and other threads are blocked. If it is a high-concurrency scenario, a large number of thread blocking will inevitably reduce throughput. How to solve this situation? You can discuss in the message area.

If it is a distributed application, you need to use distributed locks.

Hotspot data never expires

Never expires actually has two meanings:

  • Physical does not expire, no expiration time is set for hotspot keys
  • Logical expiration, store the expiration time in the value corresponding to the key, if it is found that it is about to expire, build the cache through a background asynchronous thread

pictureFrom the perspective of actual combat, this method is very friendly to performance. The only disadvantage is that when building the cache, other threads (those not building the cache) may access old data, which is acceptable for systems that do not pursue strict and strong consistency.

cache avalanche

What is cache avalanche?

Cache avalanche means that a large amount of data in the cache reaches the expiration time, and the query data volume is huge, and the request directly falls on the database, causing excessive pressure on the database or even downtime. Different from the cache breakdown, the cache breakdown refers to concurrently querying the same piece of data, and the cache avalanche means that different data have expired, and many data cannot be found, so the database is checked.

Cache Avalanche Solution

Commonly used solutions are:

  • uniform expiration
  • add mutex
  • cache never expires
  • Double cache strategy

(1) Uniform expiration

Set different expiration times to make the cache invalidation as uniform as possible. Usually, a random value can be added to the validity period or the validity period can be planned uniformly.

(2) Add mutex

Consistent with the cache breakdown solution, only one thread builds the cache at the same time, and other threads are blocked and queued.

(3) The cache never expires

Consistent with the solution to cache breakdown, the cache never expires physically, and an asynchronous thread is used to update the cache.

(4) Double-layer cache strategy

Use primary and secondary caches:

Main cache: The validity period is set according to the experience value, and it is set as the main read cache. After the main cache fails, the latest value is loaded from the database.

Backup cache: The valid period is long, the cache that is read when the lock acquisition fails, and the backup cache needs to be updated synchronously when the main cache is updated.

Cache Warming

What is cache warming?

Cache warm-up means that after the system goes online, the relevant cached data is directly loaded into the cache system, so as to avoid querying the database first and then writing the data back to the cache when the user requests it.

If no preheating is performed, the initial state data of Redis will be empty. In the early stage of system launch, high concurrent traffic will be accessed to the database, causing traffic pressure on the database.

Operation method of cache warming

  • When the amount of data is not large, the loading and caching action is performed when the project starts;
  • When the amount of data is large, set up a scheduled task script to refresh the cache;
  • When the amount of data is too large, priority is given to ensuring that hot data is loaded into the cache in advance.

cache downgrade

Cache downgrade means that when the cache fails or the cache server hangs up, the default data or the memory data of the access service are returned directly without accessing the database.

In actual project combat, some hot data is usually cached in the memory of the service, so that once an exception occurs in the cache, the memory data of the service can be used directly, thereby avoiding huge pressure on the database.

Downgrading is generally a lossy operation, so try to minimize the impact of downgrading on the business.

Guess you like

Origin blog.csdn.net/qq_41286824/article/details/125759174