Redis cache avalanche, cache penetration, cache and database double write consistency

This article is reproduced from before the interview need to know Redis face questions


  • How to solve the cache avalanche?
  • How to solve cache penetration?
  • How to ensure that the cache and the database are double-written?

One, cache avalanche

1.1 What is a cache avalanche?
Recall why we use cache (Redis):
Insert picture description here

Now there is a problem. If our cache goes down, it means that all our requests go to the database.
Insert picture description here
In the previous study, we all know that Redis cannot cache all data (memory is expensive and limited), so Redis needs to set an expiration time for the data, and adopts two strategies of lazy deletion + periodic deletion to delete expired keys. Redis's strategy for expired keys + persistence

If the expiration time set for the cached data is the same, and Redis happens to delete all this part of the data. This will cause these caches to fail at the same time during this period of time, and all requests are sent to the database.

This is the cache avalanche:

  • Redis hung up, all requests to go to the database.

  • Setting the same expiration time for the cached data causes the cache to become invalid for a certain period of time, and all requests go to the database.

If a cache avalanche occurs, it is likely to destroy our database and cause the entire service to be paralyzed!

1.2 How to solve the cache avalanche?
For "setting the same expiration time for cached data, causing the cache to become invalid for a certain period of time, all requests go to the database." This situation is very easy to solve:

  • Solution: Add a random value to the expiration time when caching, which will greatly reduce the cache expiration at the same time.

For the situation of "Redis is down, all requests go to the database", we can have the following ideas:

  • Before the incident: Realize the high availability of Redis (master-slave architecture + Sentinel or Redis Cluster), and try to avoid Redis hanging down.

  • In the event: In case Redis really hangs, we can set up local cache (ehcache) + current limit (hystrix) to avoid our database being killed (at least to ensure that our service can still work normally)

  • After the incident: Redis persists, automatically loads data from the disk after restart, and quickly restores cached data.

2. Cache penetration

2.1 What is cache penetration For
example, we have a database table with IDs starting from 1 (positive numbers):
Insert picture description here
But there may be a hacker who wants to destroy my database, and the ID requested every time is a negative number. This will cause my cache to be useless. All requests are sent to the database, but the database does not have this value, so it returns empty every time.

Cache penetration refers to querying data that must not exist. Due to cache misses and for fault tolerance considerations, if data cannot be found from the database, it will not be written to the cache. This will cause the non-existent data to be queried in the database every time a request is made, and the meaning of the cache is lost.

Insert picture description here
This is cache penetration:

  • A large number of requested data misses in the cache, causing the request to go to the database.

If cache penetration occurs, it may also bring down our database and cause the entire service to be paralyzed!

2.1 How to solve cache penetration?
There are also two solutions to cache penetration:

  • Since the requested parameters are illegal (request for non-existent parameters every time), we can use BloomFilter or compression filter to intercept in advance. If it is illegal, we will not let this request to the database layer!

  • When we can't find it from the database, we also set this empty object to the cache. The next time you request it, you can get it from the cache.

In this case, we generally set a shorter expiration time for empty objects.

Reference: Cache series articles-5. Cache penetration problem

Three, the cache is consistent with the double write of the database

3.1 For read operations, the process is like this. As
mentioned above when we talked about cache penetration: if data cannot be found from the database, it will not be written to the cache.

Generally, we have such a fixed routine for read operations:

  • If our data is in the cache, it will be taken directly from the cache.

  • If there is no data we want in the cache, we will first query the database, and then write the data found in the database to the cache.

  • Finally return the data to the request

3.2 What is the problem of double write consistency between cache and database?
If you only query, the cached data and database data are no problem. But when we want to update? Various situations are likely to cause inconsistencies between the database and the cached data.

The inconsistency here refers to: the database data is inconsistent with the cached data.
Insert picture description here
In theory, as long as we set the key expiration time, we can ensure that the cached and database data are ultimately consistent. Because as long as the cached data expires, it will be deleted. When subsequently reading, because there is no data in the cache, you can check the data in the database, and then write the data found in the database to the cache.

In addition to setting the expiration time, we also need to do more measures to try to avoid the inconsistency between the database and the cache.

3.3 For update operations
Generally speaking, when performing an update operation, we have two choices:

  • Operate the database first, then the cache

  • Operate the cache first, then the database

First of all, it should be clear that no matter which one we choose, we hope that these two operations will either succeed or fail at the same time. So, this will evolve into a distributed transaction problem.

Therefore, if the atomicity is destroyed, there may be the following situations:

  • The operation of the database succeeded, but the operation of the cache failed.

  • The operation cache succeeded, but the operation database failed.

If the first step has failed, we just return Exception directly, and the second step will not be executed at all.

Let's analyze it in detail below.

3.3.1 Operation cache
There are also two options for operation cache:

  • refresh cache

  • Delete cache

Generally, we adopt the strategy of deleting the cache and the reason is as follows:

In a high-concurrency environment, whether you are operating the database first or operating the database later, if you add an update cache, it is more likely to cause inconsistencies between the database and the cached data. (Deleting the cache is straightforward and much easier)

If you update the database every time, you must update the cache [here refers to the frequently updated scene, which will consume a certain amount of performance], it is better to delete it directly. When it is read again, it is not in the cache, then I will find it in the database, find it in the database and write it to the cache (reflecting lazy loading)

Based on these two points, it is recommended to delete the cache when it is updated!

3.3.2 Update the database first, and then delete the cache. The
normal situation is this:

  • Operate the database first and succeed;

  • Delete the cache again, it is also successful;

If atomicity is destroyed:

  • Success in the first step (operating the database) and failure in the second step (deleting the cache) will result in new data in the database and old data in the cache.

  • If the first step (operating the database) fails, we can directly return an error (Exception) without data inconsistency.

If in a high concurrency scenario, the probability of inconsistency between the database and the cached data is particularly low, and it is not without:

  • Cache just expired

  • Thread A queries the database and gets an old value

  • Thread B writes the new value to the database

  • Thread B deletes the cache

  • Thread A writes the old value found into the cache

To achieve the above situation, I still say that the probability is particularly low:

Because this condition needs to happen when the cache is read, the cache is invalid, and there is a concurrent write operation. In fact, the write operation of the database will be much slower than the read operation, and the table must be locked. The read operation must enter the database operation before the write operation, and it is later than the write operation to update the cache. All these conditions are met The probability is basically not great.

For this strategy, it is actually a design pattern: Cache Aside Pattern
Insert picture description here
The solution to the failure of deleting cache:

  • Send the key to be deleted to the message queue

  • Consume the message by yourself and get the key that needs to be deleted

  • Keep retrying the delete operation until it succeeds

3.3.3 Delete the cache first, and then update the database. The
normal situation is this:

  • Delete the cache first and succeed;

  • Update the database again, it is also successful;

If atomicity is destroyed:

  • The first step succeeds (delete the cache), and the second step fails (update the database). The database and cached data are still consistent.

  • If the first step (deleting the cache) fails, we can directly return an error (Exception), and the database and cached data are still consistent.

It seems to be very beautiful, but when we analyze it in a concurrent scenario, we know that there is still a problem:

  • Thread A deleted the cache

  • Thread B queries and finds that the cache no longer exists

  • Thread B goes to the database to query the old value

  • Thread B writes the old value to the cache

  • Thread A writes the new value to the database

So it will also lead to inconsistencies between the database and the cache.

The idea of ​​solving the inconsistency between the database and the cache under concurrent:

  • The backlog of operations such as deleting the cache, modifying the database, and reading the cache are put into the queue to achieve serialization.

Insert picture description here
3.4 Comparing the two strategies
We can find that the two strategies have their own advantages and disadvantages:

  • Delete the cache first, then update the database

    • Unsatisfactory performance under high concurrency, excellent performance when atomicity is destroyed
  • Update the database first, then delete the cache (Cache Aside Pattern design pattern)

    • Excellent performance under high concurrency, unsatisfactory performance when atomicity is destroyed

3.5 Other programs and materials
that guarantee data consistency can be updated by monitoring binlog using databus or Ali's canal.


Reference material: How does the
cache update routine
ensure data consistency when the cache and the database are double-written?
Analysis of Distributed Database and Cache Double Write Consistency Scheme
Cache Aside Pattern

Guess you like

Origin blog.csdn.net/qq_33697094/article/details/112983059