【Redis】Redis application problem solving

1. Cache penetration

Problem Description

The data corresponding to key does not exist in the data source. Every request for this key cannot be obtained from the cache, and the request will hit the data source, which may overwhelm the data. source. For example, if a non-existent user ID is used to obtain user information, neither the cache nor the database will exist. If a hacker exploits this vulnerability to attack, the database may be overwhelmed.

solution

A data that must not exist in the cache and cannot be queried, because the cache is passively written when there is a miss, and for fault tolerance considerations, if the data cannot be found from the storage layer, it will not be written to the cache, which will result in this non-existent data Every request has to go to the storage layer to query, which loses the meaning of caching.

Cache empty values:If the data returned by a query is empty (regardless of whether the data does not exist), we still cache the empty result (null) and set it to empty The expiration time of the results will be very short, no more than five minutes

Set the accessible list (whitelist):Use the bitmaps type to define an accessible list. The list id is used as the offset of the bitmaps. Each visit is the same as the one in the bitmap. The id is compared. If the access id is not in the bitmaps, it is intercepted and access is not allowed.

Use Bloom filter: (Bloom Filter was proposed by Bloom in 1970. It is actually a very long binary vector (bitmap) ) and a series of random mapping functions (hash functions). The Bloom filter can be used to retrieve whether an element is in a set. Its advantage is that space efficiency and query time are far more than the general algorithm. The disadvantage is that it has certain The misrecognition rate and deletion difficulty.) Hash all possible existing data into a large enough bitmap. Data that must not exist will be intercepted by this bitmap, thus avoiding query pressure on the underlying storage system.

Perform real-time monitoring:When you find that the hit rate of Redis begins to decrease rapidly, you need to check the access objects and accessed data, and cooperate with the operation and maintenance personnel to set up a blacklist to restrict the service< /span>

2. Cache breakdown

Problem Description

The data corresponding to the key exists, but it expires in redis. If there are a large number of concurrent requests at this time, these requests will generally load the data from the back-end DB and return it if they find that the cache has expired. Set to cache. At this time, large concurrent requests may instantly overwhelm the back-end DB.

solution 

 key may be accessed very concurrently at certain points in time, is a very "hot" data. At this time, there is a problem that needs to be considered: the problem of cache being "broken down".

(1) Pre-set popular data: Before redis peak access, put some popular data Store it in redis in advance to increase the duration of these popular data keys

(2) Real-time adjustment: Monitor on-site which data is popular and adjust the key expiration time in real time

(3) using locks: (using locks will cause inefficiency)

  1. That is, when the cache fails (the value taken out is judged to be empty), the db is not loaded immediately.
  2. First use some operations of the caching tool with a successful operation return value (such as Redis's SETNX) to set a mutex key.
  3. When the operation returns successfully, perform the load db operation again, restore the cache, and finally delete the mutex key;
  4. When the operation returns failure, it proves that there is a thread loading db. The current thread sleeps for a period of time and then retries the entire get cache method.

3. Cache avalanche 

 Problem Description

The data corresponding to key exists, but in redis expires. At this time, if there is a large number of concurrency When requests come in, if these requests find that the cache has expired, the data will usually be loaded from the back-end DB and reset to the cache. At this time, large concurrent requests may instantly overwhelm the back-end DB.

The difference between cache avalanche and cache breakdown is that here it is cached for many keys, while the former is cached for a certain key

solution 

The avalanche effect of cache invalidation has a terrible impact on the underlying system!

  1. Build a multi-level cache architecture:nginx cache + redis cache + other caches (ehcache, etc.)
  2. Use locks or queues:Use locks or queues to ensure that there will not be a large number of threads reading and writing to the database at one time, thereby avoiding a large number of concurrency during failures. The request falls to the underlying storage system. Not suitable for high concurrency situations
  3. Set the expiration flag to update the cache:Record whether the cached data has expired (set the advance amount). If it expires, it will trigger a notification to another thread to update the actual key cache in the background.
  4. Spread the cache expiration time:For example, we can add a random value to the original expiration time, such as 1-5 minutes randomly, so that the expiration time of each cache The repetition rate will be reduced, and it will be difficult to cause collective failure.

4. Distributed lock

Problem Description

With the needs of business development, the original single-machine deployment system has evolved into a distributed cluster system. Since the distributed system is multi-threaded, multi-process and distributed on different machines, this will make the concurrency control lock in the original single-machine deployment situation difficult. The strategy fails, and the simple Java API cannot provide distributed lock capabilities. In order to solve this problem, a cross-JVM mutual exclusion mechanism is needed to control access to shared resources. This is the problem that distributed locks need to solve!

Mainstream implementation solutions for distributed locks

1. Implement distributed locks based on database

2. Based on cache (Redis, etc.)

3. Based on Zookeeper

Each distributed lock solution has its own pros and cons:

1. Performance: redis is the highest

2. Reliability: zookeeper is the highest

 

Guess you like

Origin blog.csdn.net/weixin_41477928/article/details/123557190