Cache
Common database, such as oracle, mysql, etc., all data stored on the disk. In order to improve query speed, pressure to reduce the database, when we query the data, go to the query cache, Ruoguo not find the cache, go to query the database. Usually there own internal database caching mechanism, also outside the first external cache, such as redis.
Cache penetration
First, what is the cache penetrate
Under normal circumstances, we went to cache query data, have a chance to query fails, then we should database query, if the query directly to a data cache and the database does not exist, then the inquiry will go one hundred percent to the database query .
This query is called cache penetrating, because to cache query will fail, so the request will hit the database.
Cache problems caused by penetration
If someone maliciously take the data does not exist to make inquiries, will produce a large number of requests, which eventually will hit the database, the database might as to withstand the pressure and downtime.
solution
1. Cache null
Hit an empty result set of keys, put a null value in the cache, the next visit immediately know the key is invalid, do not query the database
However, this method is not perfect:
first, the cache null values require more memory space, we can set a relatively short expiration time, after this time, the cache automatically remove the empty value. Furthermore null value should be kept separate from the normal value, or when the lack of space, the cache system may have removed the normal priority, then removed a null value, this vulnerability could be attacked.
Second, if a key record in the cache to a null value, over a period of time, the database add this key, that at this time need to use some kind of way to clear this null value. If you are using redis cache, clears the data can be directly updated in the redis.
2.BloomFilter (Bloom filter)
Simply put, the Bloom filter is a data structure Bloom filter can be used to tell you something, "certainly does not exist or may exist." More details about the Bloom filter, not repeat them here. We add a buffer before the Bloom filter, query time to go to the Bloom filter query key exists, if there is no direct return, if the key may exist, go to cache queries.
Process is as follows:
Second, the cache avalanche
When the case of large-scale cache miss occurs at some point, there will be a lot of requests come in direct rushed to the database above. The result is that the database could not carry, hang directly.
solution
- Using cluster caching, caching service to ensure high availability.
- Use Hystrix limiting (Hystrix is a java-dependent isolation tool to help us manage the thread pool, so that every resource runs in its own thread pool separately).
Hot dataset failures
As mentioned above, we generally give a certain set of data in the cache expiration time, after this time, data in the cache becomes ineffective. When a query request this data again, the system will first query the database, and then rebuild the cache. as the picture shows:
But for some hot data in it (such as two star suddenly open affair), assuming that at some point a cache miss, and also there are a lot of follow-up request, so the system will rebuild the cache, but the reconstruction of the cache may take time to compare for a long time, this will lead to a large number of threads have to rebuild the cache, causing the back-end system pressure is too large, resulting in downtime. as the picture shows:
solution
- Use mutex, this approach allows only one thread to rebuild the cache, first blocking other requests until the cache reconstruction is completed, and then go back cache queries. This method is relatively simple, but there may be a risk of deadlock and blocking the thread pool. as the picture shows:
- No longer set the expiration time, which means that data in the cache will not "due to failure", and set a "logical time expired", after the "logical time expired" start a thread to query the database latest content, and then update the value of the cache.
This put an end to the problem of hot stale data radically, but the only downside is that data consistency can not be guaranteed.