Bloom filter (Bloom Filter) with Hash Algorithm

  Hash algorithm in the application, also known as a fingerprint (Fingerprint) or digest (Digest) algorithm is a mapping plaintext string of arbitrary length to a shorter data string (hash value) of the algorithm, the current algorithm is mainly MD5 Hash series SHA algorithm and system algorithm

  Hash algorithm needs to have a good four characteristics, namely forward fast, reverse difficulties, to enter sensitive, collision avoidance

    Fast forward: given plaintext and Hash algorithm, Hash values can be calculated within a limited time and limited resources

    Reverse difficult: to set Hash value, for a limited time is difficult to reverse the introduction of plain text

    Enter sensitive: the original input information is any change, Hash value of the newly created should appear very different

    Collision Avoidance: hard to find two different content in plain text, so that their Hash values match. Collision Avoidance also called collision resistance, divided into strong collision resistance and weak collision resistance. If a given plaintext under the premise of not expressly find other collide, the algorithm has a weak collision resistance; if you can not find any two plaintext Hash collision algorithm is said to have a strong collision resistance

  Since the Hash arbitrary content may be mapped to a string of fixed length, and is mapped to different content is very low probability that the same string. Therefore, it constitutes a good "content index →" generation relations. For a given memory array with the contents, can be configured by appropriate Hash function that calculates Hash values derived content does not exceed the size of the array, enabling rapid content-based lookup for determining " whether an element in a problems in the collection, "the. But Hash values within the range of maps is limited to the size of the array, it will cause a lot of Hash conflict, leading to the rapid decline in performance, so people devised Hash algorithm based on Bloom filter

  Bloom filter uses a plurality of Hash function to increase space utilization. For a given input is the same, a plurality of the plurality of addresses calculated Hash function, respectively labeled 1, to find the time, the same calculation process, and view these addresses corresponding element in the array, if all is 1, then the probability of the presence of the large input, as shown below, according to the content execution Hash1, Hash2, HashK other functions, calculate h1, h2, hk other locations, if these positions are all 1, then [email protected] there is a great probability

  

  The reason why there is a great probability is that whether it is a single Hash algorithm or Bloom filter, the idea is the same, are based on the encoded content, but because of storage limitations, a conflict may exist, that both methods there is the problem of false positives may, at the same time will not be a problem misstatements. However, in the application of the bloom filter false alarm rate much lower than the rate of false positives single Hash Algorithm

Guess you like

Origin www.cnblogs.com/yytxdy/p/12168019.html
Recommended