Redis cache penetration, cache breakdown, cache avalanche

Cache penetration

Cache penetration refers to when a user is querying a piece of data, and the database and cache do not have any records about this piece of data at this time, and if this piece of data is not found in the cache, it will request data from the database. When users can't get the data, they will keep sending requests to query the database, which will put a lot of pressure on database access.

For example, if a user queries a product information with id = -1, the id value of the database generally increases from 1. Obviously, this information is not in the database. When no information is returned, it will always query the database for the current database. Caused a lot of visit pressure.

Solution

Method 1: Cache an empty object Starting from the cache, if the current MySQL database does not exist, cache it as an empty object in redis and return it to the user. (Code maintenance is simple, but the effect is not very good).
Method 2: Redis also provides us with a solution, which is Bloom filter (code maintenance is more complicated and the effect is quite good).

Method 1: Cache empty objects

Cache an empty object means that a request is sent. If the relevant information for this request does not exist in the cache and the database at this time, the database will return an empty object and associate the empty object with the request and store it in the cache. When the request comes next time, the cache will be hit, and the empty object will be returned directly from the cache. This can reduce the pressure of accessing the database and improve the access performance of the current database. The process is shown in the following figure:

Insert picture description here


But there is a problem: if a large number of non-existent requests come, then many empty objects will be cached in the cache at this time. If the time is long, there will be a large number of empty objects in the cache, which will not only occupy A lot of memory space will waste a lot of resources!

Is there any solution?

Can these objects be cleaned up after a period of time? Redis provides us with a command about expiration time, so that we can set an expiration time when setting an empty object, which can solve a problem!

setex key seconds valule:设置键值对的同时指定过期时间(s)
redisCache.put(Integer.toString(id), null, 60) //过期时间为 60s

Method 2: Bloom filter

Bloom filter is used to filter things. It is a data structure based on probability. It mainly judges whether an element is currently in the set and runs fast.

The great use of Bloom filters is that they can quickly determine whether an element is in a set. Therefore, he has the following three usage scenarios:

  • Web crawlers de-duplicate URLs to avoid crawling the same URL address
  • Anti-spam, judging whether a mailbox is spam from billions of spam lists (similarly, spam messages)
  • Cache penetration , put all possible data caches into bloom filters, and quickly return when hackers access non-existent caches to avoid caches and DB hangs.

Bloom filter can be simply understood as an inaccurate set structure (set has the effect of deduplication).

But there is a small problem: when using its contains method to determine whether an object exists, it may misjudge. In other words, the Bloom filter is not particularly inaccurate, but as long as the parameters are set reasonably, its accuracy can be controlled relatively accurately, and there will only be a small probability of misjudgment (this is acceptable).

When the Bloom filter says that a certain value exists, the value may not exist; when it says it does not exist, it must not exist .

Bloom filter features:

  1. A very large binary bit array (only 0 and 1 exist in the array)
  2. Have several hash functions (Hash Function)
  3. Very high in space efficiency and query efficiency
  4. Bloom filter does not provide a delete method, and it is more difficult to maintain the code.

Each Bloom filter corresponds to the Redis data structure, which is a large bit array and several different unbiased hash functions. The so-called unbiased means that the hash value of the element can be calculated more uniformly.

Insert picture description here

When adding a key to the Bloom filter, multiple hash functions are used to hash the key to obtain an integer index value, and then the bit array length is modulized to obtain a position, and each hash function will calculate a different position. Then set these positions of the bit array to 1 to complete the add operation. (Each key is mapped to a huge bit array through several hash functions. After the mapping is successful, the corresponding position on the bit array will be changed to 1)

So why does Bloom filter have a false positive rate? In fact, it will misjudge the situation as follows:

When the position of key1 and key2 mapped to the bit array is 1, suppose there is a key3 at this time, and you want to query whether it is inside, and it happens that the corresponding position of key3 is also mapped to these positions, then the Bloom filter will think it exists , At this time there will be a misjudgment (because key3 is clearly absent).

Insert picture description here
Three important factors affecting the accuracy of Bloom filters:

  • The quality of the hash function
  • Storage space size
  • Number of hash functions

How to improve the accuracy of the Bloom filter?

  • The design of the hash function is also a very important issue. A good hash function can greatly reduce the false positive rate of the Bloom filter.
  • For a Bloom filter, if its bit array is larger, then the location of each key mapped by the hash function will become much sparser and not so compact, which is beneficial to improve the accuracy of the Bloom filter.
  • For a Bloom filter, if the key is mapped through many hash functions, there will be marks in many positions in the bit array, so that when the user queries, when looking for it through the Bloom filter, it will be misjudged The rate will be reduced accordingly.

Cache breakdown

There are two reasons for cache breakdown:

  1. A "unpopular" key was suddenly requested to be accessed by a large number of users.
  2. A "hot" key just expires in the cache, and a large number of users visit it.

This will cause large concurrent requests to directly penetrate the cache, request the database, and instantly increase the access pressure to the database.

Insert picture description here

Solution

The commonly used solution is to lock . When the key expires, add a lock when the key wants to query the database. At this time, only the first request can be made to query the database, and then the value queried from the database is stored in the cache. For the rest The same key can be obtained directly from the cache.

  • In stand-alone environment: directly use commonly used locks (such as Lock, Synchronized, etc.);
  • Distributed locks can be used in a distributed environment, such as distributed locks based on databases, Redis or zookeeper.

Insert picture description here


Cache avalanche

Cache avalanche means that the centralized cache expires in a certain period of time. If there are a large number of requests during this period and the amount of query data is huge, all the requests will reach the storage layer, and the amount of storage layer calls will increase sharply, causing the database Excessive pressure or even downtime.

the reason:

  • Redis is down suddenly
  • Most data is invalid

For example: For
example, we have basically experienced shopping carnivals, suppose the merchants hold 23:00-24:00 merchandise fracture promotion activities. When designing the program, the little brother of the program puts the broken goods of the merchant in the cache at 23:00, and sets the expiration time to 1 hour through the expire of redis. Many users visit these product information, purchases, etc. during this time period. But at 24:00, many users happened to be accessing these products. At this time, the access to these products would fall on the database, causing the database to resist huge pressure. A little carelessness will cause the database Direct downtime (over).

This is the case when the product has not expired:

Insert picture description here
This is the case when the cache is invalidated:

Insert picture description here

Solution

  1. Redis high-availability
    redis may hang, add a few more redis instances, (one master, multiple slaves or multiple masters and multiple slaves), so that after one hangs up, the others can continue to work, which is actually a built cluster.
  2. Current limit downgrade
    After the cache expires, locks or queues are used to control the number of threads that read the database and write cache. For a key, only one thread is allowed to query data and write cache, and other threads wait.
  3. Data preheating
    The meaning of data heating is that before the formal deployment, I first visit the possible data first, so that some of the data that may be accessed in large amounts will be loaded into the cache. Before a large concurrent access is about to happen, manually trigger the loading of different keys in the cache.
  4. Different expiration time
    Set different expiration time to make the time point of cache invalidation as even as possible.

Guess you like

Origin blog.csdn.net/QiuHaoqian/article/details/109154315