Bloom filter python3 pybloom_live usage example storage overhead

1. Install pybloom_live

from pybloom_live import BloomFilter

# 创建一个Bloom过滤器对象
# 错误率(False Positive Rate)在布隆过滤器中指的是,不存在的元素被错误地认为存在于集合中的概率
bf = BloomFilter(capacity=10000, error_rate=0.001)

# 添加元素到Bloom过滤器中
bf.add("apple")
bf.add("banana")
bf.add("orange")

# 判断元素是否在集合中
print(bf.__contains__("apple"))  # True
print(bf.__contains__("grape"))  # False

print(bf.__getstate__())  # 查看布隆过滤器状态

# 打开文件,如果文件不存在则创建
with open('output.txt', 'wb+') as f:
    # 将Bloom过滤器写入文件
    bf.tofile(f)

# print(len(bf.bitarray))

# 打开文件,如果文件不存在则创建
with open('output.txt', 'rb+') as f2:
    # 从文件中恢复Bloom过滤器
    bf2 = BloomFilter.fromfile(f2)
    print(bf2.__getstate__())  # True
    print(bf2.__contains__("apple"))  # True
    print(bf2.__contains__("grape"))  # False

it has many functions

Assume that the error rate is set to 0.001 and bf.add adds 3 elements

If the capacity is set to 10,000, the storage overhead is 18kb

The setting is that the capacity is set to 100,000, and the storage overhead is 176kb

The setting is that the capacity is set to 100,000. Assume that bf.add adds 10,000 elements.

176kb

Overhead remains unchanged

If you store these 10,000 elements directly into txt

38k

The Bloom filter is an extremely space-efficient probabilistic data structure that uses a bit array and a hash function to determine whether an element is in a set. Its time complexity and space complexity are as follows:

Time complexity: For determining whether an element is in a set, the time complexity of the Bloom filter is O(k), where k is the number of hash functions. Because we need to map the elements to k positions in the bit array through k hash functions and check whether these positions are 1.

Space Complexity: The space complexity of a Bloom filter depends on the size of the bit array. Assuming the size of the bit array is m, then the space complexity is O(m).

It should be noted that the Bloom filter may produce a false positive (False Positive), that is, an element that is not in the set may be misjudged as being in the set. But it will not produce a false negative (False Negative), that is, if it is judged that an element is not in the set, then the element is definitely not in the set.

Software engineering student Xiao Shi

20230914

Guess you like

Origin blog.csdn.net/u013288190/article/details/132875664