1. Install pybloom_live
from pybloom_live import BloomFilter
# 创建一个Bloom过滤器对象
# 错误率(False Positive Rate)在布隆过滤器中指的是,不存在的元素被错误地认为存在于集合中的概率
bf = BloomFilter(capacity=10000, error_rate=0.001)
# 添加元素到Bloom过滤器中
bf.add("apple")
bf.add("banana")
bf.add("orange")
# 判断元素是否在集合中
print(bf.__contains__("apple")) # True
print(bf.__contains__("grape")) # False
print(bf.__getstate__()) # 查看布隆过滤器状态
# 打开文件,如果文件不存在则创建
with open('output.txt', 'wb+') as f:
# 将Bloom过滤器写入文件
bf.tofile(f)
# print(len(bf.bitarray))
# 打开文件,如果文件不存在则创建
with open('output.txt', 'rb+') as f2:
# 从文件中恢复Bloom过滤器
bf2 = BloomFilter.fromfile(f2)
print(bf2.__getstate__()) # True
print(bf2.__contains__("apple")) # True
print(bf2.__contains__("grape")) # False
it has many functions
Assume that the error rate is set to 0.001 and bf.add adds 3 elements
If the capacity is set to 10,000, the storage overhead is 18kb
The setting is that the capacity is set to 100,000, and the storage overhead is 176kb
The setting is that the capacity is set to 100,000. Assume that bf.add adds 10,000 elements.
176kb
Overhead remains unchanged
If you store these 10,000 elements directly into txt
38k
The Bloom filter is an extremely space-efficient probabilistic data structure that uses a bit array and a hash function to determine whether an element is in a set. Its time complexity and space complexity are as follows:
Time complexity: For determining whether an element is in a set, the time complexity of the Bloom filter is O(k), where k is the number of hash functions. Because we need to map the elements to k positions in the bit array through k hash functions and check whether these positions are 1.
Space Complexity: The space complexity of a Bloom filter depends on the size of the bit array. Assuming the size of the bit array is m, then the space complexity is O(m).
It should be noted that the Bloom filter may produce a false positive (False Positive), that is, an element that is not in the set may be misjudged as being in the set. But it will not produce a false negative (False Negative), that is, if it is judged that an element is not in the set, then the element is definitely not in the set.
Software engineering student Xiao Shi
20230914