3.4 Bloom filter 

1. What is Bloom filter? (Determine if a certain key does not exist)

1. Bloom filter is essentially a data structure, the data comparison ingenious structure Probabilistic

2. Features are efficient inserts and queries can be used to tell you "something certainly does not exist or may exist."

3. Compared to the traditional List, Set, Map and other data structures, it is more efficient and take up less space, but the drawback is the result of its return is probabilistic, not exact.

use:

  1. Bloom filter NoSQL databases very wide applications in the field

  2. When a user to query a certain row, can filter out a large number of first row there is no request by the Bloom filter memory, and then to re-query the disk

  3. Bloom filter say that a value does not exist, it is surely not exist, can significantly reduce the number of database IO request

2. scenarios

1 ) Scenario 1 (Recommended News to the user)

1. When you read the news, sure to be filtered out, for there is no bullish news might filter very small part of the (false positives).

2. This is completely pushed to the user to ensure that the news is not repeated.

2 ) Scene 2 (url crawler to weight)

1. In the crawler system, we need to go heavy on the url has been crawling the page no longer crawling

2. When the url up to tens of millions, if a collection to hold these URL addresses is a waste of space

3. Use the Bloom filter can significantly reduce the deduplication storage consumption, but also cause the system to miss a small reptile page

3. Bloom filter principle

1. Each Bloom filter data structure corresponds to Redis is a large array of several different hash functions unbiased

2. FIG follows: f, g, h is such a hash function (hash unbiased mapping means to allow the position of the array of random Comparative)

Add: value to a bloom filter

  1) was added to the Bloom filter key, uses f, g, h hash function to the key calculating an integer index, and then taking the length I

  2) Each hash functions are calculated in a different location, the location is calculated are set to 1 was added to complete the Bloom filter process

Query: Bloom filter value

  1) When a query key, calculates a first hash function with an integer index, and then taking the length I

  2) When you do not have a 1 certainly do not have this key, when all is likely to have the key 1 time

  3) Such bloom filter to filter out a large number of memory request row does not exist, and then to re-query the disk, reducing IO operations

Delete: not supported

  1) We know that the current Bloom filter can add support operations and isExist

  2) how to solve this problem, the answer is counting delete, delete it needs to store a count value, rather than the original bit position, it will increase the memory size.

  3) is increased by a value corresponding to the index value stored in the slot plus one, minus one is deleted, it is determined whether or not there is to see whether the value is greater than 0.

Guess you like

Origin www.cnblogs.com/lihouqi/p/12664269.html