HyperLogLog realizes uv statistics

Table of contents

features

operate

pfadd add

pfcount calculation

pfmerge

Application Scenario

Statistics uv case

Statistics of daily and monthly active cases

summary


features

1. HyperLogLog is an algorithm, not unique to redis.

2. When the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small. In Redis, each HyperLogLog key only costs 12 KB of memory. This is in stark contrast to collections where more elements consume more memory when computing cardinality .

3. HyperLogLog only calculates the cardinality based on the input elements, but does not store the input elements themselves, so HyperLogLog cannot return each input element like a set.

4. The core is the cardinality estimation algorithm, and there is a certain error in the final value. The result of the cardinality estimate is an approximation with a standard error of 0.81%.

5. It does not directly occupy 12k space when storing. Its storage space is stored in a sparse matrix, and the space occupation is very small. Only when the count gradually increases and the space occupied by the sparse matrix gradually exceeds the threshold value will it be transformed into a dense one at a time. The matrix will take up 12k space.

In the case of not pursuing absolute accuracy, using a probabilistic algorithm is a good solution. The probability algorithm does not directly store the data set itself, but estimates the base value through a certain probability and statistical method. This method can greatly save memory and ensure that the error is controlled within a certain range.

operate

Redis provides three commands for HyperLogLog:

pfadd add

pfadd key element [element ...]

 When pfadding an existing element, the estimated number of elements does not change. For example, I added xiaoming above, and when Xiaoming is added again, the internal storage of the key of class:4 does not change.

pfcount calculation

pfcount key [key ...]

When it acts on multiple keys, returns the approximate cardinality of the union of all given HyperLogLogs, computed by merging all given HyperLogLogs into a temporary HyperLogLog.

pfmerge

pfmerge destkey sourcekey [sourcekey ...]

The cardinality of the combined HyperLogLog is close to the union of all observed sets of the input HyperLogLog.

Then calculate

Application Scenario

Statistics uv case

Put the ip accessed every day into the HyperLogLog structure

If you count uv for a certain day, you can directly use the pfcount date

If the uv is counted for a few days, it should be merged first, and then it can be calculated

Statistics of daily and monthly active cases

slightly

summary

Get familiar with HyperLogLog and have a basic understanding of it

Guess you like

Origin blog.csdn.net/wai_58934/article/details/131833747