A lot of fight interview Zhenti: How Statistics unique visitors with Redis!

Read this article takes about 2.8 minutes.

Author: brisk Min annoying chatters

The congregation weeks to fight a lot of terrible treatment is high, also spared no effort in terms of poaching, for some three years of development work, a little bit of good, gave 30K to the Offer.

Of course, the fight is also known for a lot of overtime a week on six days classes are the norm, basic daily working time is more than 12 hours, is quite hard.

Ado, today we chat a lot of fight in a backstage interview Zhenti, is a simple subject architecture class:

Fight a lot of hundreds of millions of users, then for a page, how to use Redis to count the number of users accessing a website of it?

Use Hash

Hash is a basic data structure of Redis, Redis is an open-bottom maintenance hash, it will be mapped to a different key on the hash table, if the keyword is experiencing conflict, it will pull out a list out.

When a user visits, if the user login too, then we use the user id, if the user is not logged in before, then we can randomly generate a key front page used to identify the user.

When the user visits, we can use  HSET command, key corresponding to the URI can be selected patchwork date, the user can use the Field or random identifier id, value may simply be set to 1.

When we visited one day to the statistics of a particular site, it can be used directly  HLEN to get the final result.

Pros: Simple, easy to implement, is also very convenient inquiry, data accuracy is very high.

Disadvantages: take up too much memory, with the increase key, the performance will fall. Small OK, a lot of this fight certainly can not stand the site of hundreds of millions of PV

Use Bitset

We know, for a 32-bit int, if we only used to record the id, it can only record a user, but if we turn into a binary, each used to represent a user, then we will be able to breath represent 32 users, space-saving 32 times!

For large amounts of data scene, if we use the bitset, you can save a lot of memory.

For the user is not logged in, we can also use a hash algorithm, the corresponding user identity hash into a digital id.

bitset very save memory, assuming that there are 100 million users, only need 100000000/8/1024/1024 approximately equal to 12 megabytes of memory.

Redis has provided us with  SETBIT approach, it is very convenient.

We can look at the following examples:

We can keep using the item page  SETBIT command to set the user has visited the page, you can also use the  method GETBIT query whether a user access.

By the end we  can count the number of visits to the website each day BITCOUNT.

The advantages of smaller memory footprint, easy access, a user can specify a query, data may be slightly flawed, for users who are not logged in, may be mapped to a different key with the id, or the need to maintain a non-login user mapping, additional s expenses.

If the user's shortcomings very sparse, then the amount of memory may be greater than a method.

Using a probabilistic algorithm

For this fight a lot more pages could very much traffic to your site, if the required number of not so accurate, you can use probabilistic algorithms, in fact, we count on a site of UV, with 1 100 000 000 30 $ 201 Wan is actually the same.

In Redis has been encapsulated HyperLogLog algorithm, he is a base evaluation algorithm.

Feature of this algorithm, stored data is not generally specific values, but for some of the data stored in the calculation of the probability.

When the user visits the site, we can use  PFADD command, you set the corresponding command by last as long as we  will be able to successfully calculate the final result PFCOUNT, because this is only a probabilistic algorithm, so there may be an error of 0.81%.

Advantage of a very small memory footprint, for a key, only 12kb. For this fight a lot over many users are particularly suitable.

Shortcoming query for the specified user, it may be wrong, after all, is not the specific data stored. Total there are some errors.

Well, above it is the number three common Redis application of statistical methods to access the site users.

 

Original link:

www.toutiao.com/i6695734985246114312/

 

 

·END·

Programmers growth path

Although the road is far, certainly the line to

This paper originating in the "road programmer growth of" micro-channel public number of the same name, reply "1024" you know, give a praise chant.

Reply [520] receive the best programmers learning

Reply to [256] View Java programmers growth plan

 

 

Guess you like

Origin www.cnblogs.com/gdjk/p/11008010.html