7. [series] Redis Redis of advanced applications - Bloom filter

Original: 7. [series] Redis Redis of advanced applications - Bloom filter

Take today's headlines, it will continue to recommend our new news, every recommendation must be de-emphasis filter out what we've seen before, headlines today how to re-do it, although we can above HyperLogLog de-emphasis, but there is no way to confirm this news has not been accessed, no pfcontains way. Is there a better solution?

Redis for us to prepare the Bloom filter is designed to solve this problem is to go heavy, it plays the same time de-emphasis function, the space can also save 90%, only slightly a certain rate of false positives.

What is the Bloom filter

Bloom filter can be understood as a slightly imprecise set structure, when you use his method contains deciding if an object is present, it may be false, but not particularly Bloom filter is not accurate, as long as the parameter settings reasonable, it can be controlled within an error range.

When the Bloom filter say that there is a value that may not exist, when it say there, it must not exist. That said, when you know someone, you may not know, when you say know, it must be do not know. The above set of usage scenarios, Bloom filter can filter out the precise content of those already seen, those who have not read can also filter out a portion, so that we can ensure that no recommendations have looked to the user.

Bloom filter Redis

Redis4.0 Bloom filter is provided in the form of plug-ins

Basic use

Bloom filter has two basic instruction, bf.add add elements, bf.exists Query element exists, bf.add accept only one element, if you want to insert more, it uses, bf.madd instruction, if determining whether there is a plurality of elements, may be used to check bf.mexists.

127.0.0.1:6379> bf.add codehole user1
(integer) 1
127.0.0.1:6379> bf.add codehole user2
(integer) 1
127.0.0.1:6379> bf.add codehole user3
(integer) 1
127.0.0.1:6379> bf.exists codehole user1
(integer) 1
127.0.0.1:6379> bf.exists codehole user2
(integer) 1
127.0.0.1:6379> bf.exists codehole user3
(integer) 1
127.0.0.1:6379> bf.exists codehole user4
(integer) 0
127.0.0.1:6379> bf.madd codehole user4 user5 user6
1) (integer) 1
2) (integer) 1
3) (integer) 1
127.0.0.1:6379> bf.mexists codehole user4 user5 user6 user7
1) (integer) 1
2) (integer) 1
3) (integer) 1
4) (integer) 0

Custom parameters

Bloom filter is automatically created when the add default parameters, Redis also provides custom parameters setting method, we need to use before you add, bf.reserve instructions explicitly created, if you've created an error is reported. There are three custom parameters, key, lower error_rate and initial_size, error rate, the greater the space used, initial_size is expected to put the size of the element, when the actual size is exceeded, the error rate will rise.

Redis default parameters are provided by default initial_size error_rate 0.01 100

Bloom filter principle

Learn the basics of using, let's look at its implementation principle.

Each Bloom filter is actually a large array of bits and unbiased hash. When add, uses a plurality of hash function to the key hash calculated for an integer index value for the bit array elements to obtain a modulo operation position, each of the hash function can be calculated in different positions, these are all set at several locations 1.

Guess you like

Origin www.cnblogs.com/lonelyxmas/p/12515049.html