Redis cache penetration, breakdown and avalanche

1. Redis cache penetration

Cache penetration means that when the user is querying a piece of data, but the database and cache do not have any records about this data, and this data is not found in the cache, it will request the database to obtain the data. When users cannot get data, they will keep sending requests and querying the database, which will put a lot of pressure on database access.

For example: the user queries a product information with id = -1. Generally, the database id value is incremented from 1. Obviously this information is not in the database. When no information is returned, it will always query the database and provide the current database with causing great access pressure. The occurrence of cache penetration is generally caused by "hacker attacks", so it should be monitored. If it is really a hacker attack, add a blacklist in time.


Methods to solve cache penetration:

① Cache empty objects Starting from the cache, if the information does not exist in the current MySQL database, cache it as an empty object in redis and return it to the user. (The code maintenance is simple, but the effect is not very good).
②. Redis also provides us with a solution, which is the Bloom filter (code maintenance is more complicated, but the effect is quite good).

Method 1: Cache empty objects

Caching an empty object means that a request is sent. If the relevant information for this request does not exist in the cache or the database at this time, then the database will return an empty object and associate the empty object with the request and store it in the cache. , when this request comes next time, the cache will hit, and the empty object will be returned directly from the cache. This can reduce the pressure of accessing the database and improve the access performance of the current database. The process is as shown below

 

 But there is a problem: if a large number of non-existent requests come in, many empty objects will be cached in the cache. If time goes by, a large number of empty objects will exist in the cache, which will not only occupy a lot of memory space, but also A lot of resources will be wasted! We can clean up these objects after a period of time. Redis provides us with commands related to expiration time, so that we can set an expiration time when setting empty objects, which can solve the problem of occupying memory space and wasting resources!

setex key seconds valule: Set the key-value pair and specify the expiration time (s)

redisCache.put(Integer.toString(id), null, 60) //Expiration time is 60s

Method 2: Bloom filter

Bloom filter is used to filter things. It is a probability-based data structure. It mainly determines whether a certain element is currently in the set and runs quickly. The great use of Bloom filters is that they can quickly determine whether an element is in a set. Therefore, it has the following three usage scenarios:

  • Web crawlers deduplicate URLs to avoid crawling the same URL address
  • Anti-spam, determine whether a certain mailbox is spam from billions of spam lists (same applies to spam text messages)
  • Cache penetration puts all possible data caches into Bloom filters. When hackers access non-existent caches, they can quickly return to avoid cache and DB hang-ups.

Bloom filter can be simply understood as a less precise set structure (set has the effect of deduplication)

But there is a small problem: when using its contains method to determine whether an object exists, it may make a misjudgment. In other words, the Bloom filter is not particularly accurate, but as long as the parameters are set reasonably, its accuracy can be controlled relatively accurately, and there will only be a small probability of misjudgment.

When a Bloom filter says a value exists, it probably doesn't exist; when it says it doesn't exist, it definitely doesn't exist .

Bloom filter features:

  1. A very large array of binary bits (only 0s and 1s exist in the array)
  2. Have several hash functions (Hash Function)
  3. Very high in space efficiency and query efficiency
  4. Bloom filters do not provide a deletion method, making code maintenance difficult.

The data structure corresponding to Redis for each Bloom filter is a large bit array and several different unbiased hash functions. The so-called unbiased means that the hash value of the element can be calculated relatively evenly.

When adding a key to a Bloom filter, multiple hash functions will be used to hash the key to calculate an integer index value and then perform a modulo operation on the length of the bit array to obtain a position. Each hash function will calculate a different position. Then set these positions of the bit array to 1 to complete the add operation. (Each key is mapped to a huge bit array through several hash functions. After the mapping is successful, the corresponding position on the bit array will be changed to 1) So why does the Bloom filter have a false positive rate
? In fact, it will misjudge the following situation:

Three important factors affect the accuracy of Bloom filters:

  • What are the advantages and disadvantages of hash functions?
  • Storage space size
  • Number of hash functions

How to improve the accuracy of Bloom filter?

    The design of hash function is also a very important issue. A good hash function can greatly reduce the false positive rate of Bloom filter.
    For a Bloom filter, if the bit array is larger, the position mapped by each key through the hash function will become much sparser and less compact, which will help improve the accuracy of the Bloom filter.
    For a Bloom filter, if the key is mapped through many hash functions, then there will be many positions on the bit array with marks, so that when the user queries and searches through the Bloom filter, misjudgment will occur. The rate will also be reduced accordingly.
 

2. Cache breakdown

There are two reasons for cache penetration:

  1. A "unpopular" key is suddenly requested for access by a large number of users.
  2. A "popular" key happens to expire in the cache, and a large number of users access it at this time.

This will cause large concurrent requests to directly penetrate the cache and request the database, instantly increasing the access pressure on the database.

 

Solution:

A common solution is to lock. When the key expires, add a lock when the key is to be queried in the database. At this time, only the first request can be made to query the database, and then the value queried from the database is stored in the cache. For the remaining The same key can be obtained directly from the cache.

    In a stand-alone environment: You can directly use commonly used locks (such as Lock, Synchronized, etc.);
    in a distributed environment, you can use distributed locks, such as database-based, Redis-based or zookeeper-based distributed locks.

3. Cache avalanche

Cache avalanche means that within a certain period of time, the cache expires and becomes invalid. If there are a large number of requests during this period of time, and the amount of query data is huge, all requests will reach the storage layer, and the number of calls to the storage layer will increase dramatically, causing database Too much pressure or even downtime.

reason:

  • Redis suddenly crashes
  • Most of the data is invalid

For example:
For example, we have basically all experienced shopping carnivals. Suppose a merchant holds a product sales promotion from 23:00 to 24:00. When the programmer was designing, he put the merchant's fractured products into the cache at 23:00, and set the expiration time to 1 hour through redis' expire. During this time, many users access these product information, make purchases, etc. But when it happens to be 24:00, there are still many users accessing these products. At this time, access to these products will fall on the database, causing the database to withstand huge pressure. If you are not careful, the database will be damaged. Directly down (over).

When the product is not expired, this is what it looks like:

When the cache is invalidated it looks like this:

Solution:

    Redis High Availability
    Redis may hang up. Add a few more redis instances (one master and multiple slaves or multiple masters and multiple slaves) so that the others can continue to work after one hangs up. In fact, it is a cluster.     After the cache is invalidated, the current limiting downgrade uses locks or queues to control the number of threads that read the database and write the cache. Only one thread is allowed to query data and write the cache for a certain key, while other threads wait
    .     Data preheating     The meaning of data heating is that before formal deployment, I first access the possible data in advance, so that some data that may be accessed in large quantities will be loaded into the cache. Manually trigger the loading of different keys in the cache before large concurrent access is about to occur.     Set different expiration times     for different expiration times to make the cache invalidation time points as even as possible.





 

Note reference blog Redis cache penetration, breakdown and avalanche_radis puncture_ambitious Scorpio's blog-CSDN blog

Guess you like

Origin blog.csdn.net/qq_52988841/article/details/132272300
Recommended