Bloom filter and cache breakdown

background

The company’s user center has a large number of user requests. To prevent cache breakdown, a caching strategy needs to be designed.Filter out malicious requests

What is cache breakdown is clear. That is, someone maliciously passes in a user ID that does not exist in the database, and then a large number of requests to the database cause the database to hang (the cache here uses redis)

Demand design

  • Redis didn't stop it, so it definitely didn't stop. Because there is no such data in redis, the general strategy is to read it in the database. If it is read, it will be returned and updated to the cache (and an expiration time is set). There are multiple requests for userId. But the userId here is different and does not exist. If this is the case, every time one comes and directly backfills one, wouldn't it be a damn short time to put pressure on redis.
    Insert picture description here

  • Below we briefly introduce the Bloom filter

Bloom filter

Introduction

  • It can be used to store non-repetitive data, not real data, but just a data identifier (value calculated by the hash function), and can occupy a small amount of space, and can achieve high query performance, but there will be small errors. When the Bloom filter says that a certain value exists, the value must exist; when it says it does not exist, it may not exist. There will be no errors in the values ​​seen for the Bloom filter.
  • Then we change our mindset to synchronize a copy of the data ID of the database to the cache. If there is a request for reids to get the value corresponding to the key, if redis does not go to the database. If there is no userId in the valua in the cache, then we just return it directly.

data structure

  1. The underlying data structure is an M bit array with a fixed length. Each time the Bloom filter stores a value, the corresponding array position is set from 0 to 1 by calculating the hash function n times. When adding again, look for the corresponding position. If any bit is not 1, it is considered that the data is not repeated.

  2. The above also said that the Bloom filter is not allowed to delete elements. Because the element you delete may share the same slot with other data elements. When you delete this element, your other elements are also messed up. As shown in the figure below, one key corresponds to 3 slots. If you delete key1, that is, restore F, G, and H to 0, then key2 will undoubtedly hang.
    Insert picture description here

  3. What is the time complexity and space complexity when we compare contains? Since it is a continuous array, it is O(1) to locate a slot each time. If there are k hash func ions, the time complexity is O(k), which is also a constant level.
    Then the complexity of his insertion and search is O(K);

  4. Compared with set, whether the positioning of set is repeated is O(1), but its space for war is very large, which means that a hash corresponds to a position, which means that the hash of this hash is very powerful (you can think of it here) Let's look at the example of hashing the key in the hashMap). That is, the value of hash cannot reuse space. According to expert calculations, the Bloom filter can take about 90% of the space. The main part of the space gap is to store the value in the set, and the er bloom filter only needs to store the fingerprint of the value (referring to the memory location calculated by multiple hash functions and the array subscript location).

  5. The error rate of the Bloom filter is traceable. You can get your satisfactory error rate based on the estimated number of stored elements. The
    Bloom filter has two parameters. The first is the number of expected elements n, The second is the error rate f. The formula
    obtains two outputs based on these two inputs. The first output is the length l of the bit array, which is the required storage space (bit), and the
    second output is the optimal number of hash functions k. The number of hash functions will also directly affect the error rate, and the best number
    will have the lowest error rate.

k=0.7*(l/n) # 约等于
f=0.6185^(l/n) # ^ 表示次方计算,也就是 math.pow
  1. Having said that, I wrote a Bloom filter in the code, but I dare not use it. The Bloom filter is so awesome. Is there anything that can be used directly?

redis4.0 get started directly with Bloom filter

  1. After redis 4.0, there is a plug-in for Bloom filter. You can directly use jedis bf.add bf.exist bf.madd bf.mexist for inserting and searching. Redisson is really too powerful now. What has redisson using bloom filters
  2. When using the redis plug-in bloom filter, we can also specify the size and error rate of the storage element, create it through bf.reserve, but if there is already a key, err will
    have three parameters key, error_rate and initial_size. The lower the error rate, the more space required. The initial_size parameter indicates the expected
    number of elements. When the actual number exceeds this value, the false positive rate will increase. Therefore, the initial setting must be accurate with high probability.

to sum up

  1. Designed a simple use of Bloom filters to intercept malicious requests
  2. Bloom filter is a probabilistic data structure that can store fingerprints of different data. It can be searched and added through O(1) complexity, and the storage space is 90% less than the set collection. However, there will be errors in the judgment of data that has not been seen (the reason for the error is one, which is hash collision).
  3. There is also redis4.0 support for Bloom filters and redisson

data

  • https://www.cnblogs.com/allensun/archive/2011/02/16/1956532.html (Principle of Bloom Filter)
  • https://www.bookstack.cn/read/redisson-wiki-en/spilt.8.6.-%E5%88%86%E5%B8%83%E5%BC%8F%E5%AF%B9%E8% B1%A1.md (redisson's use of Bloom filter)

Other applications of Bloom filters

  • For news recommendation, I cannot recommend the user's historical browsing needs, how to use Bloom filter to achieve?

Guess you like

Origin blog.csdn.net/weixin_40413961/article/details/108135235