As a java programmer, I often encounter caching in job interviews.

Zhang Gong is a java programmer. Recently, he went to an interview with a well-known Internet company. The interviewer asked this question:

What is the difference between cache penetration and cache avalanche?

For the cache, Zhang Gong usually only knows how to use it, without in-depth research. For distributed development, he just understands the general idea. Zhang Gong did not answer for a while. Interviewer: I think your resume says that there is distributed development. Experience, how can I not even understand the cache avalanche.

When the interviewer said that, Mr. Zhang felt a little embarrassed.

Cache is an important component in a distributed system. It mainly solves the performance problems of hot data access in high concurrency and big data scenarios, and provides high-performance fast data access.

The cache is memory, and memory naturally supports high concurrency. Among them, cache penetration, cache breakdown, and cache avalanche are the three major problems of cache.

1. Cache penetration

If a user wants to query a piece of data, he usually first checks it in the cache. If it cannot be found, that is, the cache does not hit, then it queries the persistence layer database. Nothing was found, so it returned empty, which was not a problem.

But if there are many users, the cache misses, and they all directly request the database, which will put a lot of pressure on the database.

E.g. The data table id is self-increasing, starting from 1, the request id sent by the user is all negative. In this case, it cannot be found in the cache, so they all request the persistence layer database, which will cause a lot of pressure on the database. That is, cache penetration.

solution:

As long as it is not found from the database, write a null value to the cache, such as set -999 unknown (unknown). Then set an expiration time. In this way, when the same key is accessed next time, the data can be directly retrieved from the cache before the cache expires.

2. Cache breakdown

Cache breakdown means that a certain key is very hot, frequently accessed, and is in a situation of centralized high concurrent access. The cache fails in a large area at the same time, and a large number of requests break through the cache, which leads to all requests to be checked. If the database is large, the database will not be able to support in minutes.

What is the solution? Of course, the answer is yes. Different solutions for cache breakdown in different scenarios:

If the cached data is basically not updated, you can try to set the hotspot data to never expire.
If the cached data is not updated frequently, and the entire process of cache refreshing takes less time, you can use distributed mutexes based on distributed middleware such as redis, or local mutexes to ensure that only a small number of requests can Request the database and rebuild the cache, and the remaining threads can access the new cache after the lock is released.
If the cached data is updated frequently or the cache refresh process takes a long time, the scheduling task can be used to actively rebuild the cache before the cache expires or delay the expiration time of the cache to ensure that all requests can be accessed until the corresponding Cache.

3. Cache avalanche

Cache avalanche refers to an abnormality in the cache layer and it cannot work normally. Therefore, all requests will reach the storage layer, and the call volume of the storage layer will increase sharply, causing the storage layer to also hang up.

For the system, assuming that there are 5000 requests per second during the peak period each day, the cache can handle 4000 requests per second during the peak period, but the cache machine unexpectedly goes down.

The cache service was down. At this time, all the 5000 requests per second fell to the database. The database couldn't support it, and it went down after an alarm. In this case, if there is no preparation plan, it will make people crazy.

This is the cache avalanche.

The pre-event solution for the cache avalanche:

Beforehand: Redis is highly available, master-slave + sentinel, redis cluster, to avoid total crash;
In the event: local ehcache cache + hystrix current limit & downgrade to avoid database failure;
After the event: Redis is persistent. Once restarted, data is automatically loaded from the disk to quickly restore cached data.

The user sends a request. After the system receives the request, it first checks the local ehcache cache, and then checks redis if it cannot be found. If ehcache and redis are not available, check the database again and write the found results into ehcache and redis.

Current limiting components, such as Sentinel, can set the number of requests per second, how many can pass the component, and the remaining unpassed requests, what should I do? After downgrading, you can return to some default values or friendly prompts.

The advantages of this are the following:

Ensure that the database will not hang, because the current limiting component ensures that only how many requests can pass per second.

As long as the database is not suspended, that is to say, for users, some of the requests can be processed.

As long as a part of the request can be processed, it means that the system is still running normally, but for the user, it may need to wait. The page may not be refreshed after a few clicks, but it can still be refreshed once by clicking several times. Yes, the experience is not very friendly.

to sum up:

Regarding the use of caches, we must pay attention to summarizing and accumulating in daily work, checking for omissions, and constantly improving our knowledge system.

Due to the author's limited level, the deficiencies in the article are inevitable, so I should give insights. Please criticize and correct any inappropriateness.

-END-

Media cooperation, branding, please add little love micro letter: iyiyouyou

Recommended in the past

It's so hard to find a programmer to be a boyfriend

Three passionate sex scenes banned in "Lust Caution"

More exciting, please scan the QR code

Little love private micro letter: iyiyouyou

Interviewer: Your resume says that you have distributed development experience. Why don't you understand the cache avalanche?

1. Cache penetration

2. Cache breakdown

3. Cache avalanche

Guess you like