Illustrates cache buffer penetration avalanche

Cache

Common database, such as oracle, mysql, etc., all data stored on the disk. In order to improve query speed, pressure to reduce the database, when we query the data, go to the query cache, Ruoguo not find the cache, go to query the database. Usually there own internal database caching mechanism, also outside the first external cache, such as redis.
Here Insert Picture Description

Cache penetration

First, what is the cache penetrate

Under normal circumstances, we went to cache query data, have a chance to query fails, then we should database query, if the query directly to a data cache and the database does not exist, then the inquiry will go one hundred percent to the database query .

This query is called cache penetrating, because to cache query will fail, so the request will hit the database.

Cache problems caused by penetration

If someone maliciously take the data does not exist to make inquiries, will produce a large number of requests, which eventually will hit the database, the database might as to withstand the pressure and downtime.

solution

1. Cache null

Hit an empty result set of keys, put a null value in the cache, the next visit immediately know the key is invalid, do not query the database

However, this method is not perfect:
first, the cache null values require more memory space, we can set a relatively short expiration time, after this time, the cache automatically remove the empty value. Furthermore null value should be kept separate from the normal value, or when the lack of space, the cache system may have removed the normal priority, then removed a null value, this vulnerability could be attacked.
Second, if a key record in the cache to a null value, over a period of time, the database add this key, that at this time need to use some kind of way to clear this null value. If you are using redis cache, clears the data can be directly updated in the redis.

2.BloomFilter (Bloom filter)

Simply put, the Bloom filter is a data structure Bloom filter can be used to tell you something, "certainly does not exist or may exist." More details about the Bloom filter, not repeat them here. We add a buffer before the Bloom filter, query time to go to the Bloom filter query key exists, if there is no direct return, if the key may exist, go to cache queries.
Process is as follows:
Here Insert Picture Description

Second, the cache avalanche

When the case of large-scale cache miss occurs at some point, there will be a lot of requests come in direct rushed to the database above. The result is that the database could not carry, hang directly.

solution

  1. Using cluster caching, caching service to ensure high availability.
  2. Use Hystrix limiting (Hystrix is ​​a java-dependent isolation tool to help us manage the thread pool, so that every resource runs in its own thread pool separately).

Hot dataset failures

As mentioned above, we generally give a certain set of data in the cache expiration time, after this time, data in the cache becomes ineffective. When a query request this data again, the system will first query the database, and then rebuild the cache. as the picture shows:
Here Insert Picture Description

But for some hot data in it (such as two star suddenly open affair), assuming that at some point a cache miss, and also there are a lot of follow-up request, so the system will rebuild the cache, but the reconstruction of the cache may take time to compare for a long time, this will lead to a large number of threads have to rebuild the cache, causing the back-end system pressure is too large, resulting in downtime. as the picture shows:
Here Insert Picture Description

solution

  1. Use mutex, this approach allows only one thread to rebuild the cache, first blocking other requests until the cache reconstruction is completed, and then go back cache queries. This method is relatively simple, but there may be a risk of deadlock and blocking the thread pool. as the picture shows:
    Here Insert Picture Description
  2. No longer set the expiration time, which means that data in the cache will not "due to failure", and set a "logical time expired", after the "logical time expired" start a thread to query the database latest content, and then update the value of the cache.
    This put an end to the problem of hot stale data radically, but the only downside is that data consistency can not be guaranteed.
Released seven original articles · won praise 4 · Views 2144

Guess you like

Origin blog.csdn.net/qq_28743877/article/details/104721672