Cached processing flow in the application
For the foreground request, the background fetches the data from the cache first, and returns the result directly, and then fetches it from the database when it is not available. The database fetches the update cache and returns the result. If the database does not fetch the result, it will directly return an empty result.
Seemingly simple processing flow; however, in a distributed application, as long as the cache is introduced, in order to ensure the robustness of the system, the cache system design must consider the problems of cache breakdown, cache penetration, and cache avalanche And solutions
Cache breakdown
Cache breakdown:There is no data in the cache, but there is data in the database。(Generally, the expiration of the cache time causes the cache to become invalid) At this time, because there are so many concurrent users, the data is not read in the read cache, and the data is retrieved from the database at the same time, causing the database pressure to increase instantaneously, causing excessive pressure, or even overwhelming the database
solution
- Set hotspot data not to expire: Directly set the cache to not expire, and then the timing task will asynchronously load the data and update the cache. This method is suitable for more extreme scenarios, such as scenarios with particularly high traffic. When using, you need to consider the time when the business can accept data inconsistencies, and the handling of abnormal situations. Don't refresh the cache at that time, and it will always be dirty. , Then it's cold
- Add mutex: Among multiple concurrent requests, only the first request (or the first few requests) thread can obtain the lock and perform database query operations. Other threads will block and wait if they cannot obtain the lock. Wait until the first thread will After the data is written to the cache, subsequent requests directly fetch the data from the cache. For the mutex choice, it can be
Java API
levelLock
orJVM
thesynchronized
or distributed lock, as long as the request to the database can greatly reduce.Note dimensions Yaoan lockedkey
dimension to lock
Pseudo code for using distributed locks, for reference only
public Object getData(String key) throws InterruptedException {
Object value = redis.get(key);
// 缓存值过期
if (value == null) {
// lockRedis:专门用于加锁的redis;
// "empty":加锁的值随便设置都可以
if (lockRedis.set(key, "empty", "PX", lockExpire, "NX")) {
try {
// 查询数据库,并写到缓存,让其他线程可以直接走缓存
value = getDataFromDb(key);
redis.set(key, value, "PX", expire);
} catch (Exception e) {
// 异常处理
} finally {
// 释放锁
lockRedis.delete(key);
}
} else {
// sleep50ms后,进行重试
Thread.sleep(50);
return getData(key);
}
}
return value;
}
Cache penetration
Cache penetration:Query a data and find that it is not in the cache; then query the database and find that there is no. Later, users kept making requests, but the cache was not hit, so they all went to request the database. This puts a lot of pressure on the database, causing the database pressure to increase instantaneously, causing excessive pressure, and may even crush the database
solution
- Interface verification: There may be a small amount in the ordinary course of business to access a nonexistent
key
situation, but in general a lot of situation does not occur, so this scenario is the most likely suffered unlawful attacks. You can do a layer of verification at the outermost layer: user authentication, data legality verification, etc., such as user authentication verification,id
basic verification,id<=0
and direct interception - Cache null: Not taken from the cache data, in the database does not get to this time may be
key-value
a write tokey-null
the cache valid time point can be set shorter, as30
seconds (set too long leads to normally would not be able to use). This prevents the user repeatedly attacked by the sameid
violent attack - Bloom filter: Using a Bloom filter all possible access memory
key
, do not existkey
they are filtered, therekey
is further query cache and database
Bloom filter
The characteristics of bloom filters areIf it is judged that it does not exist, it must not exist; if it is judged to exist, there is a high probability, but there is a small probability that it does not exist.. And this probability is controllable, we can make this probability smaller or higher, depending on the needs of the user
Consists of a Bloom filter bitSet
and a set of Hash
functions (algorithms), with a high space efficiency it is Probabilistic algorithms and data structures,Mainly used to determine whether an element exists in the collection
At initialization, bitSet
each bit is initialized 0
, and it will define the Hash
functions, for example, 3
group Hash
functions:hash1、hash2、hash3
Writing process
When we want to write a value, as follows, in order jionghui
, for example
- First,
jionghui
with the3
groupsHash
were calculated functions, resultingbitSet
in marked:1、7、10
- Will
bitSet
this3
one index mark1
Suppose we have two other values: java
and diaosi
, according to the above process with the 3
set of Hash
functions are calculated, the following results
java
:Hash
Function calculatesbitSet
the subscript:1、7、11
diaosi
:Hash
Function calculatesbitSet
the subscript:4、10、11
Query process
When we want to query a value, as follows, likewise jionghui
, for example
- First,
jionghui
with the3
groupsHash
were calculated functions, resultingbitSet
in marked:1、7、10
- Check
bitSet
this3
if a subscript are1
, if3
a label does not have to1
, it indicates that the value does not necessarily exist, if3
a subscript are1
, it only shows that there may be, does not mean that there must be
In fact, the example above has explained the problem, and only when we value jionghui
and diaosi
when the bitSet
subscript 1
are: 1、4、7、10、11
. When we joined value java
, the bitSet
index is 1
still this 5
one, so when the bitSet
index is 1
as follows: 1、4、7、10、11
when we can not determine the value of the java
deposit does not exist
The fundamental reason is that different values with Hash
the calculation function, you may get the same index, so a value of tag bits may be other values to the standard. This is why the Bloom filter can only determine that a certain value may exist, but cannot determine the inevitable existence. But conversely, if the value is based on Hash
mark-bit computing function is not all 1
, then it indicates there is no necessity, this is for sure
Cache avalanche
Cache avalanche:When a large-scale cache failure occurs at a certain moment, Such as your caching service is down, there will be a lot of requests come in direct hit DB
on, this may lead to the collapse of the entire system, called avalanche
solution
- Expiration time is broken up: Since a large number of caches are invalidated centrally, the easiest to think is to let them not take effect centrally. A random time value may be added to the cache when the expiration time, such that each
key
expiration time distribution to open, does not concentrate at the same time fail - Hotspot data does not expire: This method is the same as cache breakdown, but also focuses on considering the refresh interval and how to deal with data exceptions
Reference: https://blog.csdn.net/v123411739/article/details/115058811