Dark Horse Review Study Notes 2

1. Cache penetration

1.1 What is cache penetration?

Cache penetration means that the data requested by the client does not exist in the cache or the database, so the cache will never take effect, and these requests will hit the database.

1.2 Solution to Cache Penetration

1.2.1 Caching empty objects

Caching empty objects is a simple and violent approach
. The idea is:

When our client accesses data that does not exist, it first requests redis, but there is no data in redis at this time, and the database will be accessed at this time, but there is no data in the database either. This data penetrates the cache and hits the database directly. The concurrency that the database can carry is not as high as redis , so in order not to access the database, we store empty objects in redis. In this way, the next time the user comes to access the non-existent data, the data can also be found in redis. Although the hit is null, the request will not go to the database again

Advantages: simple implementation
Disadvantages: additional memory consumption

For example, passing an id (non-existent data) will cache me in the form of an empty object, then it means that passing various ids will be cached, so redis will cache a lot of such Rubbish.
But this problem can be solved.
The solution is that we add a ttl when the cache is null, and ttl is the validity period. We can set a relatively short ttl for it , such as five minutes or two minutes.
The cache is valid within two minutes, so when malicious users come to visit, it can also play a certain protective role . At the same time, its validity period is not long, so these junk data will be cleared after a while, and it will not It will bring a particularly large memory consumption
, but it may cause short-term inconsistencies (assuming that the user requests an ID, which does not exist, we set null for him, and at this time we really insert a piece of data for this ID , At this time, it means that there is already a null in the database, but we have cached a null. At this time, the user queries and finds null, but it actually exists. This leads to inconsistency . Only when the ttl expires, Users can only find the latest data.
For this problem , as long as we control the ttl time to a certain extent, it can be alleviated to a certain extent. Then this inconsistent time is actually acceptable as long as it is short enough. If it is really
impossible To accept this inconsistency , we can actively insert this data into the cache to overwrite the previous null when we add a new piece of data

1.2.2 Bloom filtering

To be precise, Bloom filtering is an algorithm.
The idea is:

Its principle is to add another layer of interception between the client and redis . When a user request comes, instead of checking redis, first go to Bloom filter to ask whether the data exists or not. If If the data does not exist, it will be rejected directly and will not be given a chance to continue down. If it tells you that it exists, then you are allowed to access redis .
The Bloom filter actually uses the hash idea to solve this problem. Through a huge binary array, the hash idea is used to judge whether the current data to be queried exists. If the Bloom filter judges that it exists, it will be released . This request will access redis, even if the data in redis has expired at this time, but the data must exist in the database , after querying the data in the database, put it into redis

1.2.3 Other solutions

以上两种方案都属于被动方案(即已经发生了缓存穿透然后想办法弥补)
But we can also actively take some measures to solve cache penetration

  • Enhance the complexity of id to avoid guessing id rules

In this way, it is possible to avoid letting others guess our ID rules and enter some IDs edited by ourselves.

  • Do a good job of data basic format verification
  • Strengthen user permission verification

That is to strengthen the management of user rights, for example, what users can access us, and what frequency limit does such a user have when accessing us?

  • Do a good job of limiting the hotspot parameters

2. Cache Avalanche

2.1 What is cache avalanche?

Cache avalanche means that a large number of cache keys fail at the same time or the redis service goes down, causing a large number of requests to reach the database, bringing huge pressure

2.2 The solution to cache avalanche

  • Add a random value to the TTL of different Keys

When we usually do cache, in order to warm up the cache, we may import the data in the database into our cache in advance, and this import is usually batch import . In the process of batch import, because they are imported at the same time, their ttl is set to the same value . It is very likely that when the time comes in the future, all of them will expire together , and then there will be an avalanche.
To solve this problem, we are When importing cache preset batch data , we can follow this ttl with a random number, for example, our time is valid for 30 minutes, and then we follow it with a random number between 1-5 Number , then its validity period will fluctuate between 30-35 minutes , of course we can also expand this random number, so that the expiration time of these keys can be dispersed within a period of time instead of failing together

  • Using Redis Cluster to Improve Service Availability

The cache avalanche caused by redis downtime is the most serious . To solve this problem, we must avoid redis downtime as much as possible, that is, improve the high availability of the entire redis. To improve the high availability of redis, we must use redis clusters
such as rediis Sentinel mechanism, redis sentry can realize service monitoring
If the host is down, the sentinel can automatically select one from the slave to replace the original host, so as to ensure that redis is known to be able to provide services to the outside world normally , and Our master-slave can also achieve a kind of data synchronization . If the host is down, there will still be data on the slave, so that the high availability of redis can be guaranteed to a large extent.

  • Add a degraded current limiting policy to the cache business

For example, there are some super serious accidents that people can't resist: the whole server is down, the whole computer room is down, these lead to the whole redis down, the whole cluster is over, how to ensure the availability at this time
?
At this point, we can add some downgrading and current limiting strategies to the service
. What does this mean?
That is to do some fault-tolerant processing in advance.
When we find that redis fails, we should do service degradation in time . For example, fast failure denies service instead of continuing to send this request to our database. This can protect us. The database is broken, sacrificing some services but ultimately protecting the health of our entire database. How is this degradation achieved? Or is it related to springclound related content


  • Add multi-level cache to business

What is a multi-level cache?
The usage scenarios of the cache are various. It can not only be added at the application layer, but also can be built at multiple levels. If the redis link collapses, there will be a lot of caches to make up for it. In layman’s terms, the bulletproof vest is
worn . Five layers, one layer and one layer.
Specifically , when the request is sent from the browser, the browser can add a cache, but the browser cache generally caches static data, and for some dynamic data that needs to be queried from the database The data cannot be cached. For this part, we can
cache at the reverse proxy server nginx level. If nginx misses, it will go to redis, and if redis misses, it will reach jvm (we can also build a local cache inside jum) , and finally the database.
Therefore, the problems caused by avalanches can be effectively avoided by means of multi-level caching

3. Cache breakdown

3.1 What is cache breakdown?

The cache breakdown problem is also called the hotspot key problem. It is a key that is accessed by high concurrency and the cache rebuilding business is relatively complicated. The key suddenly fails. Countless access requests will bring a huge impact to the database in an instant.

How to understand high concurrent access?
This key is accessed very, very much, it may be a certain product that is doing activities, and its cache may have countless requests to access this key at the same time

How to understand the complexity of the cache reconstruction business?

Why cache rebuild?
That is, our cache is stored in redis. It will be cleared after a certain period of time, and the cache will become invalid. After invalidation, we need to query and write redis from the database again.

But in actual development, querying and constructing this data from the database does not necessarily mean storing what is found in the database directly in redis. Sometimes some businesses are more complicated, and we need to perform data processing from tables in multiple databases. After the query, you need to do various table-associated operations, and the final result is cached.
In this way, a business may take a long time, reaching tens of milliseconds, hundreds of milliseconds, or even hundreds of milliseconds. , in this way, in such a long period of time, our redis has no cache, and at this time there will be countless requests that cannot be hit and the requests will be sent to the database, which may completely destroy the database collapsed

3.2 Solution to cache breakdown

3.2.1 Mutex locks

Mutex locks are easier to understand, that is, there are countless requests coming in to try to rebuild. We use locking to prevent these requests from being created, as long as there is one to create .

Specifically, if there is a thread that misses when it comes to query, then it will do cache reconstruction , but in order to avoid countless threads from rebuilding, we will lock, that is, after the thread finds a miss, The lock must be acquired first, and only those who successfully acquire the lock can rebuild the cache. After rebuilding, write the data into the cache , and then release the lock .
If the thread fails to acquire the lock, it will retry, but it cannot retry indefinitely. We can let it sleep for a while and retry (re-query whether it hits). If it hits, there is no need to acquire it again.

The biggest disadvantage of this solution is waiting for each other

For example, now there are 1000 threads coming at the same time, in fact, only one thread is doing reconstruction, and the other threads are waiting. Then if the reconstruction time is relatively long, such as 200ms or even 500ms, then all the threads pouring in during this period of time can only wait , which will lead to poor performance, so there is a second solution

3.2.2 Logical expiration

Logical expiration is not really expired, it can be considered as never expired.
This solution is that when we store data in redis, we don't set ttl

The cache breakdown is because the ttl is set, the cache suddenly fails, resulting in a miss and then needs to be rebuilt

1. But without setting ttl, how do we know whether this cache has expired?
Logical expiration means that when we store a piece of data (in the past, we ended up storing a kv) and add a field in the valve,
such as expire expiration time
, the expiration time here is not ttl but the current time when we add the cache On the basis of adding an expiration time (for example, 30 minutes), a time obtained by storing it
in means that we logically maintain this time
2. Does this key have no ttl expiration time? Does it mean that once this key is stored in redis in the future, never expire?
In addition, if we configure some appropriate memory elimination strategies, in theory, as long as the key is written to redis, it can always be queried, and there will be no misses.
Some hot keys are often added when we are doing activities, so we directly set the logical expiration time in redis while doing activities, and we can remove them after the activity is over.

Therefore, when any thread queries such a hot product, it can be hit theoretically. The only thing that needs to be judged is whether it has expired logically. If it has expired logically, it may indicate that the key is already some Old data needs to be updated

3. So what do we need to do next?
It is to rebuild the cache, but in order to avoid multiple threads rebuilding, he also needs to acquire the lock, so what we do here is the same as the previous one, and we also have to wait, but in order to avoid waiting too long after acquiring the
lock , then it will do one thing after getting the lock. Instead of choosing to build it by itself, it will start an independent new thread . This thread will do query data and rebuild the write cache (this logic must be reset after writing the cache Expiration time), after all these are done, the lock must be released, that is to say, this time-consuming task is no longer done by the thread itself but handed over to another thread to do it, which will be released after the thread is finished. Lock

4. Then during the period when it is written into the cache, it is equal to the old data in the cache. At this time, as soon as the thread starts a new thread to do this, what does the thread do?
Just return the old data directly . Then when other threads fail to query and acquire the lock, they will not wait but return the old data (because he knows that someone has updated the data for me, so he can directly return the old data found)

3.2.3 Comparison of advantages and disadvantages of the two schemes

1. Advantages of mutex

The mutex scheme has no additional memory consumption

Because compared with logical expiration, logical expiration will maintain an additional expiration time field on the basis of the original data , which will consume additional memory. The mutex does not need to save the logical expiration, so its memory usage is relatively small

Mutex can guarantee strong consistency

Because when a thread comes to fetch the cache, if it finds a miss, it will try to update it, and if it finds that someone has already acquired the lock and is updating, he will not say to
remove the old data, because there is no old data, and many of them are empty. (Because it will be deleted when it fails, and logical expiration will never be deleted, so this is why logical expiration can take old data instead of empty data), then he will wait for the data in the cache, then the cache The data in it must be the latest data, so as long as it gets the data, it must get the latest data , which can guarantee the strong consistency between the cache and the database

2. Disadvantages of mutexes

Waiting will reduce performance
In addition to the risk of deadlock

Suppose we have query requirements for multiple caches in our business, and there are also queries in the same business. At this time, you may get a lock. When you want to get another cache lock, it turns out that it is in another business. In this way, there will be a deadlock situation of waiting for each other.

3. Advantages of logical expiration

Because there is no need to wait for logical expiration, its performance is very good, and the concurrency capability will not be affected.

4. Disadvantages of logical expiration

But there are also disadvantages. He will return old data. When I see that it is expired, but I also use this data first, so this causes data inconsistency.
The code implementation is more complicated to maintain logic expiration and various logic judgments

5. Summary

Generally speaking, these two solutions are to solve the concurrency problem during the period of cache reconstruction.
The solution to the mutex is to let these concurrent threads execute serially or wait for each other during the period of cache reconstruction. In order to ensure security, this scheme ensures data consistency but sacrifices service availability, performance is greatly reduced, and may even be unavailable during the blocking process and the logic expires. This scheme is in cache
reconstruction During this period of time, the availability is guaranteed, and everyone can access it when they come, but the access may be that the old data is inconsistent with the database, which sacrifices the inconsistency

That is to say, one chooses consistency and the other chooses usability
没有谁好谁不好 看我们需要什么就选择什么

3.2.4 The details to be paid attention to in the implementation of mutex to solve the problem of cache breakdown

This lock is not the one we usually use. We usually use synchronize or lock. We need to realize that we can execute this lock if we get the lock. If we don't get the lock, we have to wait all the time. But this execution logic needs to be defined by ourselves, so we can no longer use the previous lock method. 1. We need to use a custom lock

2. So what method should be used to implement this custom mutex?

The so-called mutual exclusion lock means that when multiple threads execute concurrently, only one person can succeed and the others fail.

When we learn the string data type of redis, there is a command in it which is very close to this effect setnx
(set the value of a key,only if the key does not exist就是说给key赋值当且仅当这个key不存在的时候去执行,如果key存在就不执行了)

3. Why does it achieve mutual exclusion?
We first use a setnx to assign a value to a key.
Assume that the key is a lock, which represents a lock.
Now there is such an object—he wants to acquire the lock. The return value of setnx lock 1
is 1, and this return value represents success. Get lock check is successful
At this time, another object 2 wants to acquire this lock setnx lock 2
, the return value is 0
and another object performs the same operation, it is still 0 Now we use get lock to see if
the value of lock has changed
Change, it is still the initial setting of 1

This means that setnx can only write to it when the key does not exist. If the key already exists, it cannot be written.

If there are now hundreds or thousands of concurrent threads executing the setnx operation together, only the first person will succeed. After it is successfully written, other threads will execute setnx again and get a result of 0 or failure, which is Similar to the mutual exclusion we talked about (only one person succeeds and the others fail)

这就是自定义锁的一个方案 其实这也是分布式锁的基本原理,当然真正的分布式锁会比这个要复杂得多

4. You can use setnx to acquire the lock, but what about releasing the lock? Acquiring a lock is assigning a value to it, and releasing the lock is actually very simple, just delete
the lock (del lock) (after deleting it, it will be successful when someone else performs the setnx operation)

But there will be unexpected situations. For example, after setnx sets a lock, there is a problem with the program for some reason. In the end, no one executes the delete or release action . Then it is very likely that the lock will be lost in the future. It will never be released ,
so when we use setnx to set the lock, we often add a validity period to it. For example, if the validity period is set to ten seconds, generally our business execution is within 1 second, then in the process of this business execution If it is released normally, if the service fails due to some abnormality, the lock will never be released. After ten seconds in the future, the lock will be released automatically. So when we
set this lock again, we will also set one for it. The validity period is used as a cover to avoid deadlocks caused by not being released for some reason

5. How to implement in the code?
Before implementing the business, declare two methods to represent the acquisition and release of locks.
The method name of setnx is setIfAbsent

Note that we cannot directly return the execution result of setIfAbsent (this result is 0 and 1 in the command line, but spring helps us convert it to Boolean), we need to convert this result into a basic type before returning, if the result is returned directly Can do unboxing, then there may be a null pointer during the unboxing process (the bottom layer of the unboxing calls the booleanValue() method, if the flag is null, a null pointer exception will occur), so a tool class BooleanUtil can be used here The isTrue method in this method means that it returns true only when it is true, and both flase and null return false
(why there is a wrapper class, because it is a basic type without wrapping, and it is a class when it is wrapped, and many operations require classes to perform )

6. Note:

  • Note that when the lock is acquired successfully, you should check whether the redis cache exists again and do a DoubleCheck. No need to rebuild cache if present
  • There is a risk of stack overflow when recursion fails to acquire a lock
  • There is also an exception to be handled here, regardless of whether there is an exception or not, the lock must be released, so the release of the lock should be written in finally

3.2.5 Details to be paid attention to when implementing logic expiration to solve the problem of cache breakdown

Logical expiration is not real expiration. It requires us to add an additional expiration time field when storing data in redis. The key itself does not need to set ttl, so its expiration time is not controlled by redis, but It is up to our programmers to judge whether it has expired, so our business will be much more complicated

Theoretically speaking, as long as the logical expiration is set, there will be no misses.
First of all, the key will not expire, so we can think that once the key is added to the cache, it should exist forever unless the activity ends and we manually delete it . Such hot keys are often some products that participate in the event. We will add cache to them in advance, and set a logical expiration time for them at that time, so theoretically speaking, all hot keys will be added in advance and kept Exists until the end of the event. Therefore, when we query, we don’t need to judge whether it is a hit . If we really find out that the cache does not exist, it can only explain a problem, that is, this product is not active and does not belong to a hot key.

因此我们的核心逻辑就是默认缓存命中

In the case of a hit, we need to judge whether it has expired, that is, its logical expiration time.
If it is not expired , the data will be returned to the front end directly .
But if it expires , it means that the cache needs to be rebuilt, but not everyone comes. You can do cache reconstruction, so here you also need to try to acquire the mutex first and then judge whether the acquisition is successful. If the acquisition fails, the old data will be returned. If the acquisition is successful , an independent thread will be started to do cache reconstruction (query the database, write the data into Cache and set the logical expiration time and finally release the mutex) return the old data by itself

It is recommended to use a thread pool to start an independent thread for reconstruction. Don’t write a thread by yourself. The performance of writing a thread by yourself is not very good and needs to be created and destroyed frequently, so we create a thread pool

How to add this logical expiration time to the data?
Although we can directly add a logical expiration field to the entity class so that the logical expiration time field is also set in the cache when the entity class data is submitted to the cache, this method is not good enough, because it will affect the original code and The business logic has been modified

Therefore, we can define a new object Redisdata , which defines a field Local DataTime expireTime, which is the logical expiration time we set.
Now we want our entity class to have the attribute of this logical expiration time
. 1. The first one can make our entity class Inherit redis Data , then this entity class naturally has this attribute, but this method will still modify our entity class, so it is still somewhat intrusive.

2. The other is to add an Object attribute data in redisData , that is to say, this RedisData has an expiration time and a built-in data. This data is the data we want to store in redis (that is, the cache). It is a universal object for storing data . This solution does not require any modification to the original entity class at all.

As mentioned above, we will import the cache of hot data like this in advance. In actual development, we will use the background management system to store some hot data in the cache in advance. Now we do not have a background management system
. So we will give a unit test method to add hot data to the cache, which is equivalent to doing a cache warm-up in advance,
but we need to define a saveDataToRedis in the service first

Guess you like

Origin blog.csdn.net/weixin_52055811/article/details/132061508