Table of contents
Statistics of daily and monthly active cases
features
1. HyperLogLog is an algorithm, not unique to redis.
2. When the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small. In Redis, each HyperLogLog key only costs 12 KB of memory. This is in stark contrast to collections where more elements consume more memory when computing cardinality .
3. HyperLogLog only calculates the cardinality based on the input elements, but does not store the input elements themselves, so HyperLogLog cannot return each input element like a set.
4. The core is the cardinality estimation algorithm, and there is a certain error in the final value. The result of the cardinality estimate is an approximation with a standard error of 0.81%.
5. It does not directly occupy 12k space when storing. Its storage space is stored in a sparse matrix, and the space occupation is very small. Only when the count gradually increases and the space occupied by the sparse matrix gradually exceeds the threshold value will it be transformed into a dense one at a time. The matrix will take up 12k space.
In the case of not pursuing absolute accuracy, using a probabilistic algorithm is a good solution. The probability algorithm does not directly store the data set itself, but estimates the base value through a certain probability and statistical method. This method can greatly save memory and ensure that the error is controlled within a certain range.
operate
Redis provides three commands for HyperLogLog:
pfadd add
pfadd key element [element ...]
When pfadding an existing element, the estimated number of elements does not change. For example, I added xiaoming above, and when Xiaoming is added again, the internal storage of the key of class:4 does not change.
pfcount calculation
pfcount key [key ...]
When it acts on multiple keys, returns the approximate cardinality of the union of all given HyperLogLogs, computed by merging all given HyperLogLogs into a temporary HyperLogLog.
pfmerge
pfmerge destkey sourcekey [sourcekey ...]
The cardinality of the combined HyperLogLog is close to the union of all observed sets of the input HyperLogLog.
Then calculate
Application Scenario
Statistics uv case
Put the ip accessed every day into the HyperLogLog structure
If you count uv for a certain day, you can directly use the pfcount date
If the uv is counted for a few days, it should be merged first, and then it can be calculated
Statistics of daily and monthly active cases
slightly
summary
Get familiar with HyperLogLog and have a basic understanding of it