Cache Penetration and Cache Avalanche

cache penetration

What is cache penetration?

The general caching system caches queries according to the key. If there is no corresponding value, it should go to the back-end system to find it (such as DB). If the value corresponding to the key does not exist, and the number of concurrent requests for the key is large, it will put a lot of pressure on the back-end system. This is called cache penetration .

 

How to avoid it?

1: Cache is also performed when the query result is empty, and the cache time is set to be shorter, or the cache is cleared after the data corresponding to the key is inserted.

2: Filter keys that must not exist. You can put all possible keys into a large Bitmap, and filter through the bitmap when querying.

3: Even if the DB query result is empty, the cache value of this key can be set to null to reduce the pressure on the cache.

 

Cache Avalanche

What is a cache avalanche?

When the cache server restarts or a large number of caches fail in a certain period of time, it will also bring a lot of pressure to the back-end system (such as DB) when it fails.

How to avoid it?

1: After the cache is invalid, the number of threads that read the database and write the cache is controlled by locking or queueing. For example, only one thread is allowed to query the data and write the cache for a certain key, and other threads wait.

2: Different keys, set different expiration times, so that the time point of cache invalidation is as uniform as possible.

3: Do the second- level cache , A1 is the original cache, A2 is the copy cache, when A1 fails, you can access A2, the A1 cache invalidation time is set to short-term, and A2 is set to long-term

A. According to the dimensions or scenarios of business statistics, create a table with JSON format as a template;

B. Through the scheduling platform, access all the keys one by one regularly, and save the values ​​to the template table and cache cluster;

C. Continuously poll 2 to maintain the heat and integrity of the key-value;

D. The user requests to visit our cache first. Once the cache is invalid or restarted , the latest hot data is directly obtained from the database template table and cached, so that we can effectively reduce the pressure on the database;

E. This is also a cache warm-up scheme;

 

Distributed cache system

Problems faced by distributed cache systems

cache coherency issues

1: The cache system is consistent with the underlying data. This is especially important when the underlying system is "readable and writable".

2: Consistency between caches with inheritance relationships. In order to maximize the cache hit rate, the cache is also hierarchical: global cache, second-level cache. They are inherited. The global cache can consist of a second level cache. 

3: Consistency among multiple cache copies. In order to ensure the high availability of the system, two sets of storage systems (such as memcache, redis, etc.) are often connected behind the cache system.

Cache Penetration and Cache Avalanche

Described above.

 

Elimination of cached data

There are two strategies for cache elimination: (1) Regularly clean up expired caches. (2) When a user requests, determine whether the cache used by the request has expired. If it expires, go to the underlying system to get new data and update the cache. 

Both have their own advantages and disadvantages. The disadvantage of the first is that it is more troublesome to maintain a large number of cached keys. The second disadvantage is that every time a user requests, it is necessary to judge the cache invalidation, and the logic is relatively complicated. Which solution is used? , you can weigh it according to your own application scenarios.

 

1. Estimated expiration time 2. Version number (must be monotonically increasing, timestamp is the best choice) 3. Provides an interface to manually clear the cache.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325049247&siteId=291194637