What is redis cache and cache penetrate avalanche

Redis today to share a few common interview questions:

  • How to solve the cache avalanche?
  • How to solve the cache penetrate?
  • How to ensure a consistent problem with the database cache when double write?

A cache avalanche

1.1 What is the cache avalanche?

First, let's answer this question Why do we use the cache (Redis):

1, can improve performance: cache query is pure memory access, and hard disk access, so the query cache is faster than the speed of database queries

2, improve concurrent capacity: Cache grouping of part of the request, to support higher concurrency

Now there is a problem, if we hang the cache, which means that all of our requests ran the database .

We all know that Redis can not put all the data is cached ( memory expensive and limited ), so Redis data needs to set the expiration time, will have expired delete key-value pairs, it uses inert + Delete to delete the two strategies on a regular basis deletion of expired keys.

If the cache data expiration time is the same , and all of these just Redis delete part of the data of the light. This can result in this period of time, these caches fail simultaneously , all the requests to the database.

This is the cache Avalanche :

  • Redis hung up, go all database requests.
  • Cached data set the same expiration date, resulting in a certain period of time a cache miss, the request to go all the databases.

Cache avalanche If this happens, it is likely to put our database ruin , resulting in paralysis of the entire service!

1.2 How to solve the cache avalanche?

For. "Set the same expiration time for the cached data, resulting in a certain period of time a cache miss, go all database requests" This is very good solution:

  • Solution: to add a cache expiration time when a random value , which would greatly reduce the cache expire at the same time .

For "Redis hung up, go all database requests" This situation, we can have the following ideas:

  • Prior to the incident: Redis achieve the high availability (master-slave architecture + Sentinel or Redis Cluster), try to avoid Redis hang this from happening.
  • Incident in: Redis really hung up the case, we can set up a local cache (ehcache) + current limiting (hystrix) , try to avoid our database was to kill (at least to ensure that our services can still work normally)
  • After the incident: redis persistent, restart automatically load the data from disk, quick recovery cached data .

Second, the cache penetration

2.1 What is the cache penetration

For example, we have a database table, ID 1 are from the beginning of the ( positive ), but there may be a hacker trying to ruin my database, each request ID is negative . This leads to my cache useless, find all database requests to go, but there is no such database value ah, it always returns a null object out.

Cache penetration refers to a certain query data does not exist . Since the cache miss, and for fault tolerance consideration, if from finding out the database data is not written to the cache , which will lead to the absence of data on every request to the database to query , it lost the meaning of the cache.

This is the cache penetration :

  • A large number of requested data in the cache misses, leading to requests go database.

Cache penetrating If this happens, our database also possible to bring down , resulting in paralysis of the entire service!

2.1 How to solve the cache penetrate?

Solve the cache penetration, there are two options:

  • Since the parameters of the request is not legitimate (every request parameter does not exist), so we can use the Bloom filter (BloomFilter) compression filter or early interception , illegal, do not let this request to the database layer!

  • When we can not find from the database, we will be the empty cache object is set to go inside . When next time request, you can get the inside from the cache.

  • In this case we will generally empty object set a short expiration time .

Third, write caching and database consistent double

3.1 For a read operation, the process is such that

Speak cache penetration, it also mentioned above: If you can not find the data from the database cache are written.

Generally we read there is such a time fixed routine :

  • If our data in the cache inside there, then direct access to the cache.
  • If the data cache is not what we want, we will go to query the database, and the database to check out the data written to the cache .
  • Finally, the data returned to the requester

3.2 What is the cache and dual-write database consistency problem?

If only the query, the data cached data and databases are not the problem. But when we want to update the time for it? Each case is likely to result in a database and cached data inconsistency of.

  • Here are inconsistent means: inconsistent data cache with database data

In theory, as long as we set the expiration time of the key , we can ensure that the data cache and the database is ultimately consistent with the. As long as cached data has expired, it will be deleted. Then read, because there is no cache, you can check the data in the database, and then written to the database to check out the data to the cache.

In addition to setting the expiration time, we need to do more to avoid database cache in an inconsistent happen.

For the 3.3 update

In general, when performing an update operation, we have two choices:

  • To operate the database, and then operate the cache
  • First operation cache, and then operate the database

First, it must be clear that no matter what we choose, we all hope that the two operations at the same time either succeed or fail at the same time . So, this will turn into a distributed transaction issues.

So, if the atom is destroyed , there may be the following cases:

  • Successful operation of the database, the cache operation failed .
  • Cache operation was successful, the database operation failed .

If the first step has failed, we return directly Exception out just fine, the second step will not be executed.

Here we analyze specific about it.

3.3.1 Operation Cache

Cache operations have two options:

  • refresh cache
  • Delete Cache

Generally, we are taken to remove the cache caching policy for the following reasons:

  1. A highly concurrent environment, whether before or after the operation in terms of database operations database, if coupled with updating the cache, it is easier to cause the database and cache data inconsistencies . (Delete the cache directly and simply a lot)
  2. If each update the database, to be updated cache [This refers to the scene of frequent updates, which will cost some performance], should immediately deleted. Isochronous read again, no cache, I find the database, and then written to the cache found inside (reflected in the database lazily )

Based on these two points, for when updating the cache, it is recommended to perform delete operations!

3.3.2 to update the database, and then delete the cache

The normal situation is this:

  • To operate the database, success;
  • And then delete the cache, but also successful;

If the atom is destroyed:

  • The first step in successfully (operation database), the second step fails (delete cache), will result in the database is new data, old data cache .
  • If the first step (to operate the database) would have failed, we can return an error (Exception) directly, without data inconsistencies.

If in high concurrency scenarios, inconsistent database and cache data probability is particularly low , it is not without:

  • Cache just fail
  • A thread to query the database, was an old value
  • Thread B writes new values ​​to the database
  • Thread B to delete the cache
  • A thread will be found in the old value write cache

To achieve the above case, or say the probability of particularly low :

Because this condition needs to happen in the read cache cache miss, and there is concurrent with a write operation. In fact the database write operation will be much slower than the read operation, but also to lock the table, and read into the database operations required before a write operation, while running late for a write operation to update the cache , all of these conditions are met The basic probability is not large.

For this strategy, in fact, it is a design pattern:Cache Aside Pattern

Solutions to delete the cache of failure :

  • The key to be deleted is sent to the message queue
  • Own consumption message, get the key to be deleted
  • Keep retrying the delete operation until it succeeds

3.3.3 to delete the cache, and then update the database

Normally something like this:

  • Delete cache, success;
  • And then update the database, also successful;

If the atom is destroyed:

  • The first step in successfully (delete cache), the second step fails (update database), the database and the cached data is consistent.
  • If the first step (to delete the cached) has failed, we can return an error (Exception) direct, database and cached data is consistent.

Looks very good, but we analyze in concurrency scenarios, we know there are still a problem:

  • A thread deletes the cache
  • Thread B inquiry, found the cache no longer exists
  • Thread B to query the database to get the old value
  • Thread B will write caching the old value
  • Thread A new value written to the database

It will also cause problems with the database and cache inconsistencies.

Solutions concurrent database and cache inconsistencies :

  • The cache delete, modify the database read cache operation such as the backlog queue inside achieve serialization .

3.4 Comparison of two strategies

We can see that both strategies have their advantages and disadvantages:

  • Delete the cache, and then update the database
    • In a highly concurrent unsatisfactory performance, excellent in atomic is destroyed
  • To update the database, and then delete the cache ( Cache Aside Patterndesign patterns)
    • In a highly concurrent outstanding performance, unsatisfactory performance is destroyed when the atom

Guess you like

Origin www.cnblogs.com/kyoner/p/11297488.html