Introduction to Bloom Filter

Bloom Filter is used to detect whether a specified element is contained in a large amount of data.

1. Basic principles

1. Stored procedure

  1. Set all the bits of the bit array a of length m to 0;

  2. Use k mutually independent hash functions h() to calculate the elements to be searched;

  3. If h i (a) = x, where 1≤i≤k and 1≤x≤m, set the x-th position in the bit array a to 1.

2. Search process

  1. For a certain element a to be searched, use the same k hash functions to calculate it, then t bits in the bit array b are 1, and the others are 0;

  2. Compare the bit array a with the array b. If the bit that is 1 in b is also 1 in a, then the element is included in the set, otherwise it is not included.

2. Misjudgment

Example:
  Element A sets the positions 1, 3, and 5 in array a to 1, and element B sets the positions 5, 7, and 9 in array a to 1. Now you want to find element C.
  After three hash function calculations, the element C makes the 1, 5, and 9 positions 1 in the array b.
  After comparing the array a and the array b, it is found that the 1, 5, and 9 bits are all 1, so the wrong conclusion is reached : the element C is in the set.

Bloom Filter will cause misjudgment , but no missed judgment . That is, if an element is not in the set, it may be judged to be present; but if an element is in the set, the correct judgment will be made.

Three, parameter determination

  The relationship between the array size m, the collection size n, the number of hash functions k, and the misjudgment rate p:

p=(1ekn/m)k

  The optimal number of hash functions k is:
k=mnl n 2

  M that affects the memory size:
m=nlnp( l n 2)2

Four, improvement

  Bloom filters can only be used to add elements and query, but cannot delete elements.

  The improved method is called Counting Bloom Filter, that is, in the bit array, each "bit" is represented by multiple bits. When an element is added, the corresponding "bit" is increased by 1, and when an element is deleted, the corresponding "bit" is decreased by 1.

Guess you like

Origin blog.csdn.net/michael_f2008/article/details/78459453