Redis cache avalanche, penetration, breakdown

Cache avalanche

- Cause
we all know Redis can not put all the data is cached, so Redis data needs to set the expiration time, and the use of inert deleted ( when the key expires regardless of laissez-faire, but each time get a key from the key space, both Check whether the obtained key has expired, if it expires, delete the key; if it does not expire, then return the key ) + regular deletion of two strategies to delete expired keys. If the expiration time of the cached data is the same, and Redis happens to delete all the data. This will cause these caches to fail at the same time, and all requests will be sent to the database. This is the cache avalanche: Redis hangs and requests all go to the database. If a cache avalanche occurs, it is likely that our database will be destroyed, causing the entire service to be paralyzed!

Solution
Add a random value to the expiration time when caching, which will greatly reduce the cache expiration at the same time. The first emphasis is that the impact of the cache avalanche on the underlying system is terrible.

There is a simple solution to disperse the cache expiration time. For example, we add a random value to the original expiration time, such as 1 to 5 minutes random, try not to invalidate the cache at the same time, so as to avoid the cache avalanche.
Achieve a highly available architecture and try to avoid Redishangups.
RedisAfter hang up, use local cache and current limit strategy to avoid DB being directly killed.
RedisPersistent, Redisafter hanging up, it can automatically load data from the disk after restarting, and can quickly restore data.

Cache breakdown

The reason
for some Keys with an expiration time set, when these Keys are accessed by a large number of high concurrency at certain time points, this time you need to consider the problem of cache "breakdown". The difference between this problem and the avalanche is only for a certain Key cache, and cache avalanche is a cache for multiple keys. To put it simply, when a key is accessed by high concurrency at a certain point in time, and the cache happens to be expired at this time, then all requests fall to the DB. This is a transient large concurrency that may cause the DB to be overwhelmed. This phenomenon is called cache breakdown.
solution

Background refresh : A job (timed task) is defined in the background to specifically update the cache data. For example, the expiration time of the data in a cache is 30 minutes, then the job refreshes the data regularly every 29 minutes (the data from the database will be updated to In the cache). This scheme is easier to understand, but it will increase the complexity of the system. It is more suitable for services with relatively fixed keys and larger cache granularity, and those with scattered keys are less suitable and more complicated to implement.
Check and update : Save the expiration time (absolute time) of the cache key together in the cache (you can splice, you can add new fields, you can use a separate key to save. No matter what method is used, as long as the two have a good relationship). After each get operation, the cache expiration time from the get is compared with the current system time.If the cache expiration time-current system time <= 1 minute (a custom value), the cache is actively updated. It can ensure that the data in the cache is always up-to-date (as in scheme one, so that the data does not expire.). This solution can also be problematic under special circumstances. Assuming that the cache expiration time is 12:00, and there is no get request coming in the 1 minute period from 11:59 to 12:00, and it happens that the requests are concurrently sent at 11:30, it is a tragedy. This situation is more extreme, but it is not impossible. Because "high concurrency" may also be a periodic burst at a certain point in time.
Hierarchical cache : L1 (first-level cache) and L2 (second-level cache) cache methods are adopted. L1 cache has a short expiration time and L2 cache has a long expiration time. Request to get data from L1 cache first, if L1 cache misses, then lock, only 1 thread gets the lock, this thread then reads the data from the database and updates the data to L1 cache and L2 cache, while others The thread still gets data from the L2 cache and returns. This way is mainly achieved by avoiding the simultaneous cache invalidation and combining with the lock mechanism. Therefore, when the data is updated, the L1 cache can only be eliminated, and the caches in L1 and L2 cannot be eliminated at the same time. There may be dirty data in the L2 cache, and the business needs to be able to tolerate this short-term inconsistency. Moreover, this solution may cause additional waste of cache space.
Mutual exclusion lock : This solution is to acquire the cache in an asynchronous manner, other keys are waiting, and you must wait for the first to build the cache and release the lock before others can access the data through the key; the first key1 When querying the db and getting the data into the cache, the lock is locked first. Others must wait. Wait for the person to set the cache successfully before releasing the lock. Then the other person will fetch the data directly from the cache. ; Will not cause defects in database read and write performance;

Cache penetration

Causes
Cache penetration refers to querying data that must not exist. Because the cache does not hit, and for fault tolerance, if the data cannot be found from the database, it will not be written to the cache, which will cause this non-existent data to be queried in the database every time the request is lost, and the meaning of the cache is lost. A large number of requested data missed in the cache, causing the request to go to the database.
If cache penetration occurs, it may break our database and cause the entire service to be paralyzed!
solution

Since the requested parameters are illegal (requesting non-existent parameters every time), we can use BloomFilter or compression filter to intercept in advance, if illegal, the request will not be allowed to the database layer!
When we cannot find it from the database, we also set this empty object to the cache. The next time you request it, you can get it from the cache.