BloomFilter

Background note

Hash functions in the computer field, especially in the field of data quickly find, with a very wide field of encryption.

Its role is to map a large data set to a small data set above (these small data set is called the hash value or hash value).

Hash table (hash table, also called a hash table), based on a hash value (Key value) to directly access a data structure. In other words, it records accessed by the hash value mapped to a table in a position to speed up the search. The following is a typical schematic:

But this simple Hash Table there are some problems, is the question Hash conflict. Hash function is a good assumption, if we bit length of m array point, if we want to reduce the collision rate of 1% e.g., the hash table can only receiving element m * 1%. Obviously, this would not be called the effective space (Space-efficient).

   

Bloom Filter Overview

Bloom Filter 1970 proposed by Bloom (Burton Howard Bloom). It is actually a long series of random binary vector and the mapping function (Hash function). Bloom filter can be used to retrieve if an element in a set. The advantage is space efficiency and query time is far more than the general algorithm. Bloom Filter widely used in various applications requiring queries, such as:

   

Google's famous Bigtable uses the distributed database Bloom filter to find there is no row or column to reduce the number of disk IO to find.

   

In many Key-Value system is also used in the Bloom filter to speed up the process of inquiry, such as Hbase, Accumulo, Leveldb, in general, Value stored on disk, disk access takes a lot of time, but you can use Bloom filter a Key to quickly determine whether there are corresponding Value, thus avoiding many unnecessary disk IO operations, but cited Rubu Long filter will bring some memory consumption.

   

Bloom Filter Principle

If you want to determine if an element is not in a collection, the general thought is to save all the elements together, and then determined by comparison. List, data structures, trees, etc. are of this idea. But with the increase in the collection of elements, we need more and more storage, retrieval speed is getting slower and slower.

   

A Bloom Filter is based on a m-bit bit vector (b1, ... bm), the initial value of 0 bit vector. Further, a series of hash function (h1, ... hk), which belongs to the hash function range 1 ~ m. Below is a bloom filter insert x, y, z and w is determined whether a value of the data set in a schematic view:

However, as obvious disadvantages and advantages of the Bloom filter. Miscalculation rate (False Positive) is one of them. As the number of elements deposited, miscalculation rate increases. But if the number of elements is too small, then use the hash table is enough.

   

Summary: Bloom Filter is usually used in some need to quickly determine whether an element belongs to the set, but not strictly required 100% correct occasion. In addition, lead Rubu Long filter will bring some memory consumption.

   

   

   

   

plant

Guess you like

Origin www.cnblogs.com/shuzhiwei/p/11316547.html