How to solve Redis cache breakdown (invalidation), cache penetration, and cache avalanche?

0421fad57c6e782837a92676d6654c2a.gif

Author | Code

Source | Code Brother Bytes

Raw data is stored in DB (such as MySQL, Hbase, etc.), but DB has low read and write performance and high latency.

For example, MySQL's TPS = 5000 and QPS = about 10,000 on a 4-core 8G, and the average reading and writing time is 10~100 ms.

Using Redis as a caching system can just make up for the lack of DB. "Code Brother" performed the Redis performance test on his MacBook Pro 2019 as follows:

$ redis-benchmark -t set,get -n 100000 -q
SET: 107758.62 requests per second, p50=0.239 msec
GET: 108813.92 requests per second, p50=0.239 msec

TPS and QPS reached 100,000, so we introduced a cache architecture to store the original data in the database, and at the same time store one copy in the cache.

When a request comes in, first get the data from the cache, and if there is, return the data in the cache directly.

If there is no data in the cache, go to the database to read the data and write it to the cache, and then return the result.

Is this seamless? Improper design of the cache will lead to serious consequences. This article will introduce three common problems and solutions in the use of caches:

  • cache breakdown (failure);

  • cache penetration;

  • Cache Avalanche.

Cache breakdown (failure)

High concurrent traffic, the data accessed is hot data, the requested data exists in the DB, but the copy stored in Redis has expired, and the backend needs to load the data from the DB and write it to Redis.

Keywords: single hotspot data, high concurrency, data failure

However, due to high concurrency, the DB may be overwhelmed, making the service unavailable. As shown below:

740d413447d9256ce6fb2e73d402072e.pngcache breakdown


solution

Expiration time + random value

For hot data, we do not set the expiration time, so that all requests can be processed in the cache, and the high throughput performance of Redis can be fully utilized.

Or add a random value to the expiration time.

When designing the expiration time of the cache, use the formula: expiration time = baes time + random time.

That is, when the same business data is written to the cache, a random expiration time is added on top of the basic expiration time, so that the data will slowly expire in the future, so as to avoid the instantaneous expiration of all the data and cause excessive pressure on the DB.

warm up

Store the popular data in Redis in advance, and set the expiration time of the popular data to a very large value.

use lock

When a cache invalidation is found, data is not loaded from the database immediately.

Instead, the distributed lock is acquired first, and the database query and write data to the cache operation are performed after the lock is successfully acquired. If the lock acquisition fails, it means that there is currently a thread executing the database query operation, and the current thread sleeps for a period of time before retrying.

This allows only one request to go to the database to read data.

The pseudo code is as follows:

public Object getData(String id) {
    String desc = redis.get(id);
        // 缓存为空,过期了
        if (desc == null) {
            // 互斥锁,只有一个请求可以成功
            if (redis(lockName)) {
                try
                    // 从数据库取出数据
                    desc = getFromDB(id);
                    // 写到 Redis
                    redis.set(id, desc, 60 * 60 * 24);
                } catch (Exception ex) {
                    LogHelper.error(ex);
                } finally {
                    // 确保最后删除,释放锁
                    redis.del(lockName);
                    return desc;
                }
            } else {
                // 否则睡眠200ms,接着获取锁
                Thread.sleep(200);
                return getData(id);
            }
        }
}

cache penetration

Cache penetration: It means that there is a special request to query a non-existent data, that is, the data does not exist in Redis nor in the database.

As a result, every request will penetrate into the database, and the cache becomes a decoration, which puts a lot of pressure on the database and affects normal services.

as the picture shows:

3ea6eb58b2509909782ba4ba96d93e56.png

cache penetration

solution

  • Cache empty value: When the requested data does not exist in Redis nor in the database, set a default value (for example: None). When the query is performed again later, the null value or the default value is directly returned.

  • Bloom filter: Synchronize the ID to the Bloom filter when the data is written to the database. When the requested id does not exist in the Bloom filter, it means that the data queried by the request must not be saved in the database. Do not go to the database query.

BloomFilter needs to cache the full amount of keys, which requires a small number of full keys, and it is best to have less than 10 billion pieces of data, because 10 billion pieces of data will occupy about 3.5GB of memory.

Let's talk about the principle of Bloom filter.

The algorithm of BloomFilter is to first allocate a memory space as a bit array, and the initial values ​​of the bit bits of the array are all set to 0.

When adding elements, use k independent Hash functions to calculate, and then set all K positions of the element Hash map to 1.

Detect whether the key exists, and still use the k Hash functions to calculate k positions. If the positions are all 1, it indicates that the key exists, otherwise it does not exist.

As shown below:

10d1903becc1faac2132c257542a664d.png

Bloom filter

The hash function will collide, so the Bloom filter will have false positives.

The false positive rate here refers to the probability that BloomFilter judges that a key exists, but it does not actually exist, because it stores the Hash value of the key, not the value of the key.

So there is a probability that there are such keys, their contents are different, but the hash values ​​after multiple hashes are the same.

For the key that BloomFilter judges to be non-existent, it is 100% non-existent. Contradictory method, if the key exists, the corresponding Hash value position after each Hash must be 1, not 0. The existence of a Bloom filter does not necessarily exist.

Cache Avalanche

Cache avalanche means that a large number of requests cannot be processed in the Redis cache system, and all requests hit the database, resulting in a surge in database pressure and even downtime.

There are two main reasons for this:

  • A large amount of hot data expires at the same time, resulting in a large number of requests that need to query the database and write to the cache;

  • Redis is down, and the cache system is abnormal.

Cache a large amount of data and expire at the same time

The data is stored in the cache system and the expiration time is set, but because at the same moment, a large amount of data expires at the same time.

The system sends all requests to the database to obtain data. If the concurrency is large, the pressure on the database will surge.

Cache avalanche occurs when a large amount of data is invalid at the same time, while cache breakdown (invalidation) is when a hot data fails . This is their biggest difference.

As shown below:

1c6243d77e27178e4f213fabe7441b83.png

Cache Avalanche - Massive cache invalidation at the same time

solution

Add random value to expiration time

To avoid setting the same expiration time for a large amount of data, expiration time = baes time + random time (smaller random number, such as random increase of 1~5 minutes).

In this way, all hot data at the same time will not be invalidated, and the difference in expiration time will not be too large, which not only ensures the invalidation of similar time, but also meets the business needs.

Interface current limiting

When accessing non-core data, add interface current limiting protection to the query method . For example, set 10000 req/s.

If the core data interface is accessed, the cache does not exist to allow queries from the database and set to the cache.

This way, only part of the request will be sent to the database, reducing the pressure.

Current limiting means that we control the number of requests entering the system per second at the front end of the request entry of the business system to avoid too many requests being sent to the database.

As shown below:

420806372f0135cbcfeef44110e9b4ce.png

Cache Avalanche - Current Limit

Redis downtime

A Redis instance can support 100,000 QPS, while a database instance has only 1,000 QPS.

Once Redis goes down, it will cause a large number of requests to hit the database, resulting in a cache avalanche.

solution

There are two solutions to cache avalanches caused by cache system failures:

  • Service fuse and interface current limiting;

  • Build a high-availability cache cluster system.

Service circuit breaker and current limiting

In the business system, the use of service fuse for high concurrency is detrimental to the provision of services to ensure the availability of the system.

Service fuse means that when an abnormality is found in the data obtained from the cache, the error data is directly returned to the front end, preventing all traffic from hitting the database and causing downtime.

Service interruption and current limiting belong to the solution of how to reduce the impact of the avalanche on the database when a cache avalanche occurs.

Build a highly available cache cluster

Therefore, the cache system must build a Redis high-availability cluster. If the master node of Redis fails, the slave node can switch to become the master node and continue to provide cache services, avoiding cache avalanches caused by cache instance downtime. question.

Summarize

  • Cache penetration refers to the fact that the database does not have this data, the request goes straight to the database, and the cache system is useless.

  • Cache breakdown (invalidation) means that the database has data, and the cache should have data, but the cache has expired, and the Redis traffic protection barrier has been broken down, and the request goes straight to the database.

  • Cache avalanche refers to the fact that a large amount of hot data cannot be processed in the Redis cache (a large area of ​​hot data cache is invalid, Redis is down), and all traffic hits the database, causing great pressure on the database.

References:

https://segmentfault.com/a/1190000039688578

https://cloud.tencent.com/developer/article/1824584

https://learn.lianglianglee.com/

https://time.geekbang.org/

84c21b8a0fb74cfff525bcb5f41a599d.gif

e631ad446e108ce15e53cc9e45c17eae.png

Recommended in the past

Why is everyone resisting the use of timed tasks to implement the function of "closing overtime orders"?

If you are asked about distributed locks, how should you answer?

Stop using Redis List to implement message queues, Stream is designed for queues

Basic knowledge of Java: what is a "bridge method"?

aa5a02b1eef84ffbed8e41104761e7c8.gif

point to share

fdc01c48b9081ed0e83e73d338bf8a23.gif

Favorites

a7b125bc4b6243c38459981af005a613.gif

Like

fc3ee2492f54eeefd0bfc84d39f78748.gif

click to watch

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324239923&siteId=291194637