Redis Chapter 5 and 6 - Redis Concurrent Cache Architecture and Performance Optimization

1. Cache design

1. Cache penetration

Cache penetration refers to querying a data that does not exist at all, and neither the cache layer nor the storage layer will hit. Usually, for fault tolerance considerations, if the data cannot be found from the storage layer, it will not be written to the cache layer.
Cache penetration will cause non-existent data to be queried at the storage layer every time, which loses the meaning of cache protection back-end storage.

There are two basic reasons for cache penetration:
First, there is a problem with its own business code or data.
Second, some malicious attacks, crawlers, etc. cause a large number of empty hits.

Solution to cache penetration problem:
(1) Cache empty objects, and set an expiration time for this cache
insert image description here
(2)
For malicious attacks, cache penetration caused by requesting a large amount of non-existent data from the server can also be used The Bloom filter performs a filter first, and the Bloom filter can generally filter out non-existent data, preventing the request from being sent to the backend. When a Bloom filter says a value exists, it probably doesn't; when it says it doesn't, it sure doesn't.
insert image description here
A Bloom filter is simply a large bit array and several different unbiased hash functions. The so-called unbiased means that the hash value of the element can be calculated relatively uniformly.
When adding a key to the Bloom filter, multiple hash functions are used to hash the key to obtain an integer index value, and then a modulo operation is performed on the length of the bit array to obtain a position. Each hash function will calculate a different position. Then set these positions of the bit array to 1 to complete the add operation.
When asking the Bloom filter whether the key exists, like add, it will also calculate several positions of the hash to see if these positions in the bit array are all 1. As long as one bit is 0, it means This key does not exist in the bloom filter. If they are all 1, this does not mean that the key must exist, but it is very likely to exist, because these bits are set to 1 may be caused by the existence of other keys. If the bit array is relatively sparse, the probability will be high, and if the bit array is crowded, the probability will decrease.
This method is suitable for application scenarios where the data hit rate is not high, the data is relatively fixed, and the real-time performance is low (usually the data set is large). The code maintenance is relatively complicated, but the cache space is very small.
insert image description here
The core code is as above, first create a Bloom filter through redisson, give it a name, the most important thing is to set the size of the filter, actually depends on the business scenario, how much data your redis needs to cache, and then set an error rate , because the Bloom filter cannot perfectly filter out non-existing keys.
Then store the data key to be cached in the Bloom filter, and then it is equivalent to judging whether the element exists in the collection!
Note: To use a Bloom filter, you need to put all the data into the Bloom filter in advance, and put it into the Bloom filter when adding data; the Bloom filter cannot
delete data, and if you want to delete it, you must reinitialize the data.

(2.) Cache failure (breakdown)
Due to a large number of caches or the failure of the same cache at the same time, a large number of requests may penetrate the cache and go directly to the database at the same time, which may cause excessive pressure on the database or even hang up. For this situation, we When increasing the cache in batches, it is best to set the cache expiration time of this batch of data to different times within a period of time.
insert image description here
For cache breakdown, there are generally two solutions:
the first is to set a random expiration time for a large number of caches, so that these keys do not expire instantly; the
second is to add distributed locks, when a thread After getting the lock to get the data, cache the data in redis, and the rest of the threads will go to redis to get it

(3) Cache avalanche
Since the cache layer carries a large number of requests, it effectively protects the storage layer. However, if the cache layer cannot provide services due to some reasons (for example, the cache layer cannot support the large concurrency, or the cache design is not good, Similar to a large number of requests to access bigkey, resulting in a sharp drop in the concurrency that the cache can support), so a large number of requests will hit the storage layer, and the number of calls to the storage layer will increase sharply, causing the storage layer to cascade down.

To prevent and solve the cache avalanche problem, we can start from the following three aspects.
1) Ensure high availability of cache layer services, such as using Redis Sentinel or Redis Cluster.
2) Dependent isolation components fuse and degrade for the back-end current limiter. For example, use Sentinel or Hystrix current limiting and downgrading components.
For example, service degradation, we can adopt different processing methods for different data. When the business application accesses non-core data (such as e-commerce product attributes, user information, etc.), temporarily stop querying these data from the cache, but directly return the predefined default degradation information, null value or error message; when When a business application accesses core data (such as e-commerce product inventory), it is still allowed to query the cache, and if the cache is missing, it can continue to read through the database.
3) Rehearse ahead of time. Before the project goes online, after the caching layer goes down, the application and back-end load conditions and possible problems will be rehearsed, and some pre-plan settings will be made on this basis.

2. Hotspot cache key reconstruction optimization

Developers use the strategy of "caching + expiration time" to speed up data reading and writing and ensure regular data updates. This mode can basically meet most of the needs.
However, if two problems appear at the same time, it may cause fatal harm to the application:

  • The current key is a hot key (such as a popular entertainment news, or a product with a sudden increase in visits, such as Banlangen in the epidemic), and the amount of concurrency is very large
  • Rebuilding the cache cannot be completed in a short time, it may be a complex calculation, such as complex SQL, multiple IO, multiple dependencies, etc.

At the moment when the cache becomes invalid, there are a large number of threads to rebuild the cache, which increases the load on the backend and may even cause the application to crash.
The main way to solve this problem is to avoid a large number of threads rebuilding the cache at the same time.
We can use a mutex ( read-write lock ) to solve the problem. This method only allows one thread to rebuild the cache, and other threads wait for the thread that rebuilds the cache to finish executing, and then get the data from the cache again.

3. Inconsistency between cache and database data

  1. double write inconsistency

insert image description here
Explanation: When thread 1 updates the database, the cache has not been updated due to various reasons such as network or business logic. At this time, thread 2 also updates this data and updates the cache. Finally, thread 1 updates the cache. Then there will be data inconsistencies

  1. Interpretation of inconsistencies in read and write concurrency
    insert image description here
    : When thread 1 updates the data and deletes the cache at the same time, thread 3 queries the cache and finds that there is no data, because thread 1 has deleted the cache, then thread 3 will check the database and want to update the cache at the same time , but due to various reasons such as network fluctuations, business logic, etc., it got stuck. At this time, thread 2 updated this data and deleted the cache at the same time, but thread 3 still updated the old data into the cache.

Solution:
1. For data with a small chance of concurrency (such as order data in the personal dimension, user data, etc.), this problem is almost unnecessary to consider, and cache inconsistency rarely occurs. You can add an expiration time to the cached data. It is enough to trigger the active update of the read every once in a while.
2. Even if the concurrency is high, if the business can tolerate short-term cache data inconsistencies (such as product names, product classification menus, etc.), caching plus expiration time can still solve most business caching requirements.
3. If you cannot tolerate the inconsistency of the cached data, you can add distributed read-write locks to ensure that concurrent reads and writes or writes are queued in order, and reads and reads are equivalent to no locks.
4. You can also use Ali's open source canal to modify the cache in a timely manner by monitoring the binlog logs of the database, but the introduction of new middleware increases the complexity of the system.
Summarize:

  • The purpose of introducing caching is to improve performance. The above solutions are also in the case of more reads and fewer writes. If there is a situation of more reads and more writes, and the inconsistency of the cache cannot be received, there is no need to use the cache at all. This is not enough Are you strong enough?
  • Of course, if the database cannot withstand the pressure, the cache can also be used as the main storage for data reading and writing, and the data is synchronized to the database asynchronously, and the database is only used as a backup of the data.
  • The data put into the cache should be data that does not require high real-time performance and consistency. Remember not to do a lot of over-design and control in order to use the cache while ensuring absolute consistency, which will increase the complexity of the system!

4. Bigkey hazards!

In Redis, a string can be up to 512MB, and a secondary data structure (such as hash, list, set, zset) can store about 4 billion (2^32-1) elements, but in practice, if the following two situations, I will think it is bigkey.

  1. String type: its big is reflected in the fact that a single value is very large, and it is generally considered that a bigkey exceeds 10KB.
  2. Non-string types: hashes, lists, sets, and ordered sets. Their bigness is reflected in the large number of elements.
    Generally speaking, the string type is controlled within 10KB, and the number of hash, list, set, and zset elements should not exceed 5000.

For non-string bigkeys, do not use del to delete, use hscan, sscan, zscan to delete gradually, and at the same time, pay attention to prevent automatic deletion of bigkey expiration time (for example, if a 2 million zset is set to expire in 1 hour, the del operation will be triggered, causing block)

The hazards of bigkey:

  • 1. Cause redis to block

  • 2. Network congestion
    bigkey means that the network traffic to be generated each time is large. Assuming a bigkey is 1MB, and the client access volume is 1000 per second, then 1000MB of traffic is generated per second. For ordinary gigabit network cards ( According to the byte count (128MB/s), it is a disaster for a server, and the general server will be deployed in a single-machine multi-instance mode, which means that a bigkey may also affect other instances, and the consequences are unimaginable.

  • 3. There is a bigkey for overdue deletion
    , which is safe and sound (only executes simple commands, such as hget, lpop, zscore, etc.), but it sets an expiration time. When it expires, it will be deleted. If the expiration asynchronous of Redis 4.0 is not used Delete (lazyfree-lazy-expire yes), there is a possibility of blocking Redis.

The generation of bigkey:
Generally speaking, the generation of bigkey is due to improper program design or unclear prediction of the data scale.

How to optimize bigkey

  1. Split
    big list: list1, list2, ... listN
    big hash: It can be said that data is stored in segments, such as a large key, assuming that 1 million data is stored, it can be split into 200 keys, and each key stores 5000 The amount of data
  2. If bigkey is unavoidable, think about whether to take out all the elements every time (for example, sometimes only hmget is needed instead of hgetall), and the same is true for deletion, try to use an elegant way to handle it.
  3. [Recommendation]: Control the life cycle of the key, redis is not a trash can. It is recommended to use expire to set the expiration time (if conditions permit, the expiration time can be dispersed to prevent centralized expiration).

5. Use of redis connection pool

Using a database with a connection pool can effectively control the connection while improving efficiency. The standard usage method:
insert image description here
insert image description here
1) maxTotal: the maximum number of connections
insert image description here
insert image description here

2) maxIdle and minIdle
insert image description here
explanation: maxTotal: 100 maxIdle: 20 minIdle: 10

  • The redis thread pool is a lazy loading process. Connections are created only when they are in use, and the connections are placed in the thread pool at the same time.
  • When it is necessary to create a connection, the maximum thread pool is 100. After the execution is completed, the thread pool connection will be gradually released, and the connection will be restored to 20 connections. Only when special configuration is made, will the connection be released to 10. Generally, maxIdle will be used as the default value. allow;
  • But minIdle also has another function: it is convenient for thread pool to warm up
  • If there will be a lot of requests immediately after the system is started, you can warm up the redis connection pool, such as quickly creating some redis connections, executing simple commands, similar to ping(), and quickly increasing the idle connections in the connection pool to the number of minIdle.

insert image description here

6. Redis has three clearing strategies for expired keys

  1. Passive deletion: When reading/writing an expired key, the lazy deletion strategy will be triggered to directly delete the expired key
  2. Active deletion: Since the lazy deletion strategy cannot guarantee that cold data is deleted in time, Redis will actively eliminate a batch of expired keys on a regular basis
  3. When the currently used memory exceeds the maxmemory limit, an active cleanup strategy is triggered

The active cleaning strategy implemented a total of 6 memory elimination strategies before Redis 4.0, and added 2 strategies after 4.0, for a total of 8 types: LRU algorithm (Least Recently Used,
insert image description here
least recently used)
eliminates data that has not been accessed for a long time , using the last access time as a reference.
The LFU algorithm (Least Frequently Used)
eliminates the data that has been accessed the least frequently in the most recent period, using the number of times as a reference.

Configure maxmemory-policy (the default is noeviction) according to your own business type. It is recommended to use volatile-lru.
If the maximum memory is not set, when the Redis memory exceeds the physical memory limit, the data in the memory will start to exchange frequently with the disk (swap), which will cause the performance of Redis to drop sharply.
When Redis runs in the master-slave mode, only the master node will execute the expired deletion policy, and then synchronize the delete operation "del key" to the slave node to delete data.

Guess you like

Origin blog.csdn.net/qq_39944028/article/details/131039041