Redis advanced data structures HyperLogLog, Bloom filter

Advanced data structures in Redis
five, HyperLogLog
HyperLogLog: Redis advanced data structures, statistics for solving the problem, to provide inaccurate counting scheme weight (standard error 0.81%)
1, use
pfadd: increasing the count (and set usage asdd as a fill it up to a)
pfcount: acquisition count (and usage scard like, direct access count)
pfmerage: a plurality pf accumulated count is further formed with a new value pf

2, using HyperLogLog precautions
HyperLogLog less suitable for a single user to related statistics, because the data structure requires 12kb storage space, the user space excessive cost prohibitive
then redis counted in hours comparing a sparse matrix is used HyperLogLog when the count becomes large space exceeds the threshold value, into a dense matrix (occupied only 12KB)
then redis HyperLogLog implemented using a plurality of buckets (six register) independent count, a total of 16,384 (2 ^ 14),
each bucket maxbits requires six bit may represent the maximum maxbits = 63, so that each key representing memory (2 ^ 14) * 6/ 8 = 12kb

Sixth, the Bloom filter (high-level data structures)
mainly for categories such as news feeds of (large amounts of data) to heavy
Bloom Filter features:
save space when de-emphasis (over 90%)
a little bit inaccurate
Bloom Filter is what
Bloom Filter : a SET inaccurate, when using contians determine if an object exists is possible misjudgment
if the Bloom filter determines a value does not exist, the value certainly does not exist; if there is not necessarily the presence

Bloom algorithm: similar hash set to determine if an element (key) is in a set of
algorithms:

  1. First, the need of k hash functions to each key the hash function may be an integer
  2. At initialization, a required length of an array of n bits, each bit is initialized to zero
  3. When added to a key set of k hash functions used to calculate the hash values ​​of k, and the bit positions corresponding to the array 1
  4. When determining whether a key set with the k hash functions to calculate the k hash values, and queries the corresponding bit array, if all the bits are 1, that in the collection.
    Advantages: no need to store key, space saving
    disadvantages:
  5. Algorithm that determines the key in the collection, there is a certain probability that key is not in their collection
  6. You can not delete

Bloom Filter Redis in
use to achieve the underlying mapping BitMap Redis Bloom filter. Use docker can experience the Bloom filter directly in redis in.
command:

docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom
docker exec -it bloomfilter redis-cli

Basic usage of the Bloom filter:
bf.add: add elements, add one element
bf.exists: Find element exists, a query of a
bf.madd: add multiple elements
bf.mexists: a query multiple elements
bf .reserve: explicitly created, then an error is present if the key
three parameters:
key
error_rate: the error rate, the greater the smaller the space required (default: 0.01)
initial_size: expected number of elements loaded, when this value exceeds the actual number of , the false positive rate will rise
needs to be set in advance to avoid a large value exceeding cause false positive rate rise (default 100)
Note:
initial_size Assembly had wasted storage space is too small will cause seror_rate rise,
we need to estimate as accurately as possible before use the number of elements, but also need to add some redundancy space

Bloom filter principle
Bloom filter: redis in the data structure of a large number of bits is not the same group and the number of hash functions unbiased
unbiased: is the hash value of the element can be calculated relatively uniform, such that the element is hash maps to compare the location of a random array
was added (add):
to the Bloom filter when a key, use the hash function to the key for the plurality of hash, to give an integer index value,
and then a modulo operation on the length of the array to find a location each hash function will calculate a different location, these locations are set to 1, the operation is completed add
query key exists:
and add the same hash calculated that several locations to see if all is 1, there is not a 0 there is, if there is not necessarily a full 1

Seven simple flow restrictor
limiting: when the system capacity is limited, to prevent unplanned requested pressure system
simply limiting: using a sliding time window (fixed width), the value of the score redis zset use in, circled time by the
only data within the window of time to retain outside can give up
disadvantages: If the record short time great, it will consume a lot of space

Funnel limiting: limited capacity, flow rate representative of the discharge spout, the system allows the maximum frequency of this behavior, the remaining space represents the current behavior can be the number of ongoing
redis the current limiting module Redis-Cell (using a funnel algorithm), provide atomic limiting instruction
instruction: cl.throttle
cl.throttle alvin: reply. 1 15 30 60
Representative: alvin allows the user to reply to a maximum frequency of 30 60s, the initial capacity of the hopper 15

Guess you like

Origin blog.csdn.net/alvin_666/article/details/89433695
Recommended