The basic concept and use of redis HyperLogLog

concept

HyperLogLog is an advanced data structure of Redis. It is a cardinality statistics algorithm for high-performance cardinality (deduplication) statistics. Its advantage is that when the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small. In Redis, each HyperLogLog key only needs 12KB of memory to calculate 2^64 data³.

The principle of HyperLogLog is to map the input elements to a fixed-size bitmap through a random mapping function, and then estimate the cardinality by the number of zero bits in the bitmap. HyperLogLog uses some tricks to reduce the error rate, such as: using multiple random mapping functions, using sparse bitmaps, etc¹².

(1) Redis HyperLogLog | Rookie Tutorial. https://www.runoob.com/redis/redis-hyperloglog.html
(2) Approaching the source code: the magical HyperLogLog - Zhihu. https://zhuanlan.zhihu.com/ p/58519480
(3) Excellent Cardinality Statistics Algorithm - HyperLogLog - Zhihu. https://zhuanlan.zhihu.com/p/462973469

basic use

Statistics of UV for a day

insert image description here

Statistics of UV for two days

insert image description here

Python operation redis hyperloglog code example

The following is a basic usage code example of HyperLogLog:

import redis

r = redis.Redis(host='localhost', port=6379, db=0)

r.pfadd('hll', 'a', 'b', 'c', 'd', 'e')
print(r.pfcount('hll'))

In this example, we use the Python Redis client to connect to the Redis server. We used the pfadd command to add elements to the HyperLogLog and the pfcount command to calculate the cardinality. In this example, we added 5 elements and then calculated the cardinality. The output is 5.

Redis calculates the memory size of the value corresponding to a key

redis built-in command: can calculate the memory size of the value corresponding to a key

Starting from version 4.0, Redis provides the MEMORY USAGE command that allows you to view a specific key size. The specific example is as follows:

redis_host:6379 > set knowledge dict OK
redis_host:6379 > memory usage knowledge
(integer) 59

Returns the number of bytes occupied by the memory of the specified key. If the key to be viewed is an embedded type (a collection type other than string), you can use the SAMPLES option to specify the number of elements to sample. Assuming that the key knowledgedict is of hash type and has 10,000 elements, then to estimate the occupied size of the key, you can use the SAMPLES option to specify the number of sampled elements, and estimate the total size after sampling average calculation.

Python code: redis command to calculate value storage capacity

Redis does not have a command to directly calculate the value storage capacity. However, we can estimate the value storage capacity by calculating the memory space occupied by the string. The following is the command to calculate the memory space occupied by strings:

import redis

r = redis.Redis(host='localhost', port=6379, db=0)

print(r.memory_usage('key', 1))

In this example, we use the Python Redis client to connect to the Redis server. We used the memory_usage command to calculate the string memory usage. In this example, we calculate the memory footprint of a string of length 1. The output is: 43 .

(1) The key and value size limit of Redis|Does the value need to be compressed? - CSDN blog. https://blog.csdn.net/inthat/article/details/127325451.
(2) Learn Redis together - how to calculate the memory space occupied by strings - Nuggets. https://juejin.cn/post /6886726965030551559.
(3) Redis set memory size and view memory usage - CSDN blog. https://blog.csdn.net/qq_38056518/article/details/122107638.

Bernoulli probability experiment

A Bernoulli probability experiment is a random experiment with only two possible outcomes: success or failure. The characteristic of this kind of experiment is that the results of each experiment are independent of each other, and the probability of success of each experiment is the same. This kind of experiment is usually used to study the probability of an event occurring.

For example, we can perform a Bernoulli experiment of tossing a coin. The probability of the coin coming up heads is 0.5 and the probability of tails is also 0.5. The outcomes of each coin toss are independent of each other, and each toss has the same probability of coming up heads or tails.
insert image description here

Maximum Likelihood Estimation

Maximum likelihood estimation is a parameter estimation method whose goal is to find the parameter values ​​that maximize the likelihood function for a given data set. In this approach, we assume that the dataset is drawn from a known probability distribution, but we do not know the parameters of this distribution. We use maximum likelihood estimation to find the parameter values ​​that are most likely to produce this dataset.
Let's look at an example. Suppose we have a coin and we don't know whether the coin is fair or unjust. We can perform some experiments to determine whether this coin is fair or not. We can toss a coin 10 times and count the number of heads. If we get 7 heads and 3 tails, then we can use maximum likelihood estimation to estimate whether the coin is fair or unfair.

In this example, we assume that the coin is fair, so that each toss has a probability of 0.5 for both heads and tails. We can describe this experiment using the binomial distribution. The binomial distribution has two parameters: n and p. n represents the number of trials, and p represents the probability of success for each trial. In this example, n=10 and p=0.5.

Now we can use maximum likelihood estimation to estimate whether this coin is fair or not. We can calculate the probability of getting 7 heads and 3 tails assuming the coin is fair. We can then calculate the probability of getting 7 heads and 3 tails assuming the coin is unfair. Finally, we choose the hypothesis with greater probability as the estimate.

insert image description here

harmonic mean

The harmonic mean is one of the methods for calculating the average of a set of values, and is generally used when calculating the average rate. The harmonic mean is obtained by taking the reciprocal of all values ​​and calculating their arithmetic mean, and then taking the reciprocal of the arithmetic mean. The result is equal to the number of values ​​divided by the sum of the reciprocals of the values¹.

For example, if you want to go from A to B, the speed of the first leg is v1, and the speed of the second leg is v2, then your average speed should be the harmonic average speed: 2/(1/v1+1/v2) . This is because the distances you walk are equal, so you need to calculate the time required for each distance, then add these times and divide by the total length of distance.

Guess you like

Origin blog.csdn.net/a772304419/article/details/130560217