How to count website visits

This article introduces the use of redis HyperLoglogto realize the statistical function of uv.

background

First of all, let's clarify the actual meaning of the term uv. uv represents the number of people visiting and browsing through the webpage, which is similar to the reading volume of the article, but it should be noted that even if a person visits multiple times, it is only counted once.

Therefore, if this method of counting uv is put in the back-end code, it will be of little use. It is recommended to use redis HpyerLoglogto implement statistics. I believe that some partners have come to ask questions, why not use redis set to do it? Let's take a look HperLoglogat the usage scenario first. This is a redis data type that is ignored but very useful.

HyperLoglog

Redis HyperLogLog(HLL)is a cardinality estimation algorithm for approximating the number of distinct elements in large datasets. It can provide near-exact count results, but uses very little storage space.

HyperLogLogCardinality is estimated using a probabilistic algorithm. It works by mapping the hash value of an element into a fixed-length bit array and utilizing some specific bit operations to compute an approximation of the base. In Redis, HyperLogLogdata structures can store the cardinality of multiple different collections.

Here are some common operations using Redis HyperLogLog:

  1. PFADD key element [element ...]: HyperLogLogAdds one or more elements to the data structure.
    Example:PFADD hllset "element1" "element2" "element3"

  2. PFCOUNT key [key ...]: Returns the estimated cardinality in the HyperLogLog data structure.
    Example:PFCOUNT hllset

  3. PFMERGE destkey sourcekey [sourcekey ...]: Merge multiple HyperLogLog data structures into a new HyperLogLog data structure.
    Example:PFMERGE mergedset hllset1 hllset2

The function of Redis HyperLogLogis very suitable for use when deduplication or counting of massive data is required. It is characterized by small storage space, fast execution speed, and a configurable error range for approximate counting. However, it should be noted that because it is based on a probabilistic algorithm, it may have certain errors in the counting results, so it cannot be used in precise counting scenarios.

Therefore, according to the above statement, the uv statistics scene is very suitable HyperLoglogfor use.

command line test

Having said so much, let's test the effect on the command line first.

It is obvious that it is indeed repeated, let's demonstrate it with code below.

code testing

Let me just show my test code.

The logic here is to insert the visits of 100w users in batches, and finally get the value of nv. The results of multiple tests on my side are all 1001048around, that is to say, the data of about 1000 more is incorrect, but this does not affect the evaluation and statistics of nv.

Well, the above is today’s sharing, thank you for reading.

Together shigen, every day is different!

Supongo que te gusta

Origin blog.csdn.net/weixin_55768452/article/details/132706514
Recomendado
Clasificación