Bloom filter
Scene analysis
How to solve this problem, this time will think of Bloom filters, useful redis
, useful guava
, and self-implemented.
First of all, the Bloom filter is a bit
vector or array, which is treated as an array here.
We have to initialize this array to give this filter a certain length, and then we can key
perform an hash
algorithm on one , which can be multiple times hash
, the following figure is hash
3 times, and the corresponding index
value of the array is set to 1.
After completion, when it is key
time to request again, the three hash
results will be ANDed. If the result is 1, it means our key
existence, that is, we can continue to request the cache or database, and if it is 0, we will return directly. If one key
of hash
the results of the other key
of the hash
conflict with the results, then it is the case of miscarriage of justice occurred.
// 假设我的bloomFilter算法用的是String的hashCode()的方法,就会出现哈希碰撞
@Test
public void testVoid(){
System.out.println("Ea".hashCode());
System.out.println("FB".hashCode());
}
// 打印结果
// 2236
// 2236
This is it 否一定为否,真不一定为真
.
achieve
About BloomFilter
realize there are many, we use the most simpleguava
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>25.1-jre</version>
</dependency>
@Test
public void bloomFilter(){
// 指定过滤器的大小
int size = 100000;
// 误差率
double fpp = 0.001;
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), size, fpp);
// 将过滤器存满数据,当数据<1000000时 一定存在,大于1000000 理论上不存在,但是存在误判
for (int i = 0; i < 100000 ; i++) {
bloomFilter.put(""+i);
}
int count = 0;
// 取100000以上的数字 执行1000000次 存在一次 计数器+1
for (int j = 100000; j < 200000 ; j++) {
if (bloomFilter.test(""+j)){
count++;
}
}
System.out.println(count/100000.0);
}
// 打印结果
0.00112
Reference
[Bloom Filter (Bloom Filter) to be continued (https://www.cnblogs.com/noKing/p/9352377.html)]
[Great Vernacular Bloom Filter (https://www.cnblogs.com/CodeBear/p/10911177.html)]