Bloom filter solves Redis cache penetration [Absolutely easy to understand]

1. Cache penetration

Before talking about Bloom filters, let’s first understand what cache penetration is?
In a high-concurrency environment, a certain key is constantly accessed, but the key cannot be found in the cache, so our program will read it from the database, eh? It’s not even in the database! Well, it doesn't exist at all, end of the play. this willAs a result, a large number of invalid requests are hit to the database.The data in the cache is still empty. Invalid requests, such as data with an ID of -1 or extremely large non-existent data, the user at this time is likely to be an attacker. The attack will cause excessive pressure on the database, and in severe cases, it will directly affect online business.

Solution:
① Add verification at the interface layer, such as directly intercepting requests with id less than 0;
② Cache objects with empty query interfaces, put null into the cache, and set the expiration time;
③ Use Bloom filters , hash all possible data into a bitmap that is large enough, and data that must not exist will be intercepted by this bitmap, thereby avoiding query pressure on the underlying storage system.

2. Working principle of Bloom filter

Insert image description here

注意,布隆过滤器只是用来判断某个数据是否存在于数据库中,它并不是用来查数据的,需要配合 Redis 使用!

(1) First, we can treat it as a binary array and retrieve the storage location of the data by calculating the hash value. Initially, the data stored in it are all 0, which means there is no data;

(2)The Bloom filter itself does not store data, but calculates the data in the database through a certain hash function.Map it into a fixed-length bit array, and the data mapped to the array changes from 0 to 1;

(3) The same principle applies when querying, first calculate the hash value, and then find the data at the corresponding location. If it is 1, it means there is data, and if it is 0, the data does not exist;

(4) Loading data into the Bloom filter requires using a hash function to calculate the storage location. We all know that hash values ​​will repeat. Doesn’t this mean that two data with the same hash value, one exists and the other does not exist? Can existing data also be muddled through? This is a shortcoming of the Bloom filter - misjudgment;

(5) What to do if there is a misjudgment? When we store data, we only use one hash function to calculate the storage location. The probability of hash collision is also too high. Therefore, the Bloom filter uses multiple different hash functions to calculate multiple hash values. To solve this problem, that is to say, a data occupies more than one position. If k hash functions are used internally in the Bloom filter, then it will occupy k array positions, and the data in these k positions are all set. is 1. When searching, we no longer only search for one location,It must be ensured that the values ​​​​of these k positions are all 1 before it exists. As long as one of them is 0, it does not exist.

(6) In fact, when actually using it, we do not directly set the number of hash functions. It provides us with an interface method. We need to specify the filter length and false positive rate. The filter will be set according to the false positive rate. The number of hash functions, of courseThe lower the false positive rate is set, the more hash functions are needed.

(7) Maybe you also feel that the more hash functions there are, the higher the accuracy of the final query will be. So should I set the misjudgment rate as small as possible? Since the smaller the better, why not just set it to 0? nonono, have you ever thought about how much hash function you need at this time? How much array space is needed? How much calculation time does it take? Yes, the price of pursuing correctness is giving up performance;

(8) The above mentioned are the addition and query data of Bloom filter.Bloom filters do not support deletion operations.Although we use multiple hash functions to calculate the storage location and try to ensure that the hash bit combination of each data is unique, for each hash bit, it generally stores more than one element. This The flag bit is not unique to you. When you delete the element, you set the flag bit to 0. What about other data? Did you delete other people’s too? So for Bloom filters, deletion operations are not feasible.

3. Cooperate with Redis query process

Finally, let’s talk about the overall query process:
first, the data in the database is registered when the Bloom filter is initialized. After that, every query request initiated by the front end will first be filtered by the Bloom filter. If the identification bits are all If it is 1, it means that the data exists, so go to Redis to check the data. If Redis has it, it will return it. If it does not, go to the database and check it. The query result will be returned to the front end and written into the cache. If one of the flag bits is not 1, it means that
this The data does not exist. If we don't do anything, a large number of requests will be hit to the database. Cache penetration will cause excessive pressure on the database, seriously affecting normal business. Therefore, for this kind of non-existent data, the Bloom filter can directly intercept it without checking it in the database, but return the error message to the front end.

Insert image description here

Guess you like

Origin blog.csdn.net/m0_52861684/article/details/133299826