This article introduces the use of redis HyperLoglog
to realize the statistical function of uv.
background
First of all, let's clarify the actual meaning of the term uv. uv represents the number of people visiting and browsing through the webpage, which is similar to the reading volume of the article, but it should be noted that even if a person visits multiple times, it is only counted once.
Therefore, if this method of counting uv is put in the back-end code, it will be of little use. It is recommended to use redis HpyerLoglog
to implement statistics. I believe that some partners have come to ask questions, why not use redis set to do it? Let's take a look HperLoglog
at the usage scenario first. This is a redis data type that is ignored but very useful.
HyperLoglog
Redis HyperLogLog(HLL)
is a cardinality estimation algorithm for approximating the number of distinct elements in large datasets. It can provide near-exact count results, but uses very little storage space.
HyperLogLog
Cardinality is estimated using a probabilistic algorithm. It works by mapping the hash value of an element into a fixed-length bit array and utilizing some specific bit operations to compute an approximation of the base. In Redis, HyperLogLog
data structures can store the cardinality of multiple different collections.
Here are some common operations using Redis HyperLogLog:
-
PFADD key element [element ...]
:HyperLogLog
Adds one or more elements to the data structure.
Example:PFADD hllset "element1" "element2" "element3"
-
PFCOUNT key [key ...]
: Returns the estimated cardinality in the HyperLogLog data structure.
Example:PFCOUNT hllset
-
PFMERGE destkey sourcekey [sourcekey ...]
: Merge multiple HyperLogLog data structures into a new HyperLogLog data structure.
Example:PFMERGE mergedset hllset1 hllset2
The function of Redis HyperLogLog
is very suitable for use when deduplication or counting of massive data is required. It is characterized by small storage space, fast execution speed, and a configurable error range for approximate counting. However, it should be noted that because it is based on a probabilistic algorithm, it may have certain errors in the counting results, so it cannot be used in precise counting scenarios.
Therefore, according to the above statement, the uv statistics scene is very suitable HyperLoglog
for use.
command line test
Having said so much, let's test the effect on the command line first.
It is obvious that it is indeed repeated, let's demonstrate it with code below.
code testing
Let me just show my test code.
The logic here is to insert the visits of 100w users in batches, and finally get the value of nv. The results of multiple tests on my side are all 1001048
around, that is to say, the data of about 1000 more is incorrect, but this does not affect the evaluation and statistics of nv.
Well, the above is today’s sharing, thank you for reading.
Together shigen
, every day is different!