Microservice solution - BloomFilter Bloom filter (outside the second)

Bloom filter

Scene analysis


Insert picture description here
How to solve this problem, this time will think of Bloom filters, useful redis, useful guava, and self-implemented.

First of all, the Bloom filter is a bitvector or array, which is treated as an array here.

We have to initialize this array to give this filter a certain length, and then we can keyperform an hashalgorithm on one , which can be multiple times hash, the following figure is hash3 times, and the corresponding indexvalue of the array is set to 1.

Insert picture description here
After completion, when it is keytime to request again, the three hashresults will be ANDed. If the result is 1, it means our keyexistence, that is, we can continue to request the cache or database, and if it is 0, we will return directly. If one keyof hashthe results of the other keyof the hashconflict with the results, then it is the case of miscarriage of justice occurred.

// 假设我的bloomFilter算法用的是String的hashCode()的方法,就会出现哈希碰撞
 @Test
  public void testVoid(){
    
    
      System.out.println("Ea".hashCode());
      System.out.println("FB".hashCode());
  }
// 打印结果
// 2236
// 2236

This is it 否一定为否,真不一定为真.Insert picture description here

achieve


About BloomFilterrealize there are many, we use the most simpleguava

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>25.1-jre</version>
</dependency>
@Test
    public void bloomFilter(){
    
    
        // 指定过滤器的大小
        int size = 100000;
        // 误差率
        double fpp = 0.001;
        BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), size, fpp);
        // 将过滤器存满数据,当数据<1000000时 一定存在,大于1000000 理论上不存在,但是存在误判
        for (int i = 0; i < 100000 ; i++) {
    
    
            bloomFilter.put(""+i);
        }

        int count = 0;
        // 取100000以上的数字 执行1000000次 存在一次 计数器+1
        for (int j = 100000; j < 200000 ; j++) {
    
    
            if (bloomFilter.test(""+j)){
    
    
                count++;
            }
        }
        System.out.println(count/100000.0);
    }

// 打印结果
0.00112

Reference


[How to calculate the false positive rate of Bloom filter? (https://www.zhihu.com/question/38573286)]

[Bloom Filter (Bloom Filter) to be continued (https://www.cnblogs.com/noKing/p/9352377.html)]

[Great Vernacular Bloom Filter (https://www.cnblogs.com/CodeBear/p/10911177.html)]

Guess you like

Origin blog.csdn.net/weixin_42126468/article/details/106163501