Three derivative data structure of distributed the Redis

introduction

Speaking of redisdata structures, you may be familiar with the five basic data types: String, Hash, List, Set, Sorted Set. Then addition, there are three major derived data structure, it usually is little contact, namely: bitmaps, hyperloglog, geoAlso, I think, these three data structures, can only be icing on the cake. In a real project, I really have not used. Here we look at these three data structure definitions and uses

bitmaps

definition

Speaking of this bitmaps, in fact it is String, but it can Stringbit operation. And then, this bit operations, has its own command, and so the operation Stringof the rediscommand not much different! It can be understood

bitmaps in a bit unit of the array, each element of the array can store 0 and 1

Here's an example, such as we have to do a set operation, keyfor the w, valuefor the h, then execute the following command

127.0.0.1:6379> set w h
OK
127.0.0.1:6379> get w
"h"
复制代码

So hthe ASCII to 0110 1000the next, you can use the command-bit getbitinstruction fetch, remove the contents of each bit.

127.0.0.1:6379> getbit w 0 #用getbit获取w第0位的值
(integer) 0
127.0.0.1:6379> getbit w 1 #用getbit获取w第1位的值
(integer) 1
127.0.0.1:6379> getbit w 2 #用getbit获取w第2位的值
(integer) 1
127.0.0.1:6379> getbit w 3 #用getbit获取w第3位的值
(integer) 0
复制代码

use

Internet rumors, this structure is used to statistics, the number of active users in a certain period of time, the use bitmapof the structure than the traditional setstructure of space-saving. However, this use has many limitations, I will talk later. Let me talk about the online statement. Suppose there are 30 users, of which there are five users, in 2018-10-04this day landed. It is assumed that five users userid = 2,4,8,11,12. So, we assume that the key is users:2018-10-04, its valuevalue is used to record user login information. Then in order to record the five users have logged on, we have the valuesecond bit value, bit 4, No. 8, No. 11, 12th bit set to 1, i.e., by executing the command

127.0.0.1:6379> setbit users:2018-10-04 2 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 4 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 8 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 11 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 12 1
(integer) 0
复制代码

This time, for example, you want to determine the user userid = 11, in 2018-10-04 this day, have not logged in before, then execute the following command

127.0.0.1:6379> getbit users:2018-10-04 11
(integer) 1
复制代码

The result is 1, it means that the user landed. If the return result is 0, it means the user does not have landed. If you want to statistics, 2018-10-04, the day the number of users logged in, you can execute the following command

127.0.0.1:6379> bitcount users:2018-10-04
(integer) 5
复制代码

The above command on statistics, 2018-10-04, the day 5 users have logged on. ok, here we can see much on the investigation. Let me talk about, here userid=2,4,8,11,12, can be understood as an offset. For example, userid bit actual project 1000002, then the offset is two. Everyone in the project, it can be flexible. However, there is a limitation in this way. We in the actual project, if you use uuid userid is generated, then, how do you generate offset these userid? Could you have to find a hash function to generate the offset? Even if found appropriate hash function, you can ensure a certain hash collision does not occur, resulting in consistent offset? So, we can understand.

HyperLogLog

definition

HyperLogLogNot a data structure, but an algorithm, you can use very little memory space to complete the total number of statistically independent . In fact, it may be unfamiliar to the algorithm. We javahave a library call stream-lib, which also implements HyperLogLogthe algorithm. I say something about the algorithm of principle , I do not want to move out to a lengthy mathematical papers, we also looked bored, here Hyperit refers to the super mean, it's past life is LogLog algorithm. Here I touched the installed 13 bit, we can comprehend the essence can be. Consider the following dialogue

Me: "ditty ah, ah suppose, I throw a coin five times, then lost a lot of rounds, found that the rounds, the most frequent continuous 2 1 back positive, you can guess how many rounds I lost Mody ! "ditty:" it should be no rounds, at most, seventy-eight "me:." FML, so witty, how counted "ditty:"? very simple ah, positive and negative probability is 1/2, even the second negative, not a positive one is 1/2 1/2 1/2 what "me:"! what if a maximum of 4 consecutive times appear back once the front of it? "ditty:" it should be a lot of it round ! "me:" Sure enough wit '!

And the chat, between me and my colleagues from the music, daily mutual blow! Any similarity is purely coincidental!

Well, the principle finished! But his estimation algorithm is more complicated! Not as simple as that! And so estimation error is relatively large! Algorithm The following pseudo-code is given.

输入:一个集合
输出:集合的独立总数
算法:
     max = 0
     对于集合中的每个元素:
               hashCode = hash(元素)
               num = hashCode二进制表示中最前面连续的0的数量
               if num > max:
                   max = num
     最后的结果是2的(max + 1)次幂
复制代码

It should be noted

hashCode = hash(元素)
复制代码

Is your input elements, the binary is mapped into a long string. After a long string of binary mapping, it can be compared to the results of the first to say I'm a coin flip. As for the final result Why (max+1), you can go to check documents. After all, this article is talking about redis, not talking about this algorithm. Moreover this algorithm, also after a series of later evolution, such as the set of parameters into the m portions, and the mresult of seeking a partial average (avg), 2 to the final (avg + 1)power to estimate the total number of independent! These readers are interested can check on their own!

use

This structure can be very provincial statistics to various counts of memory, such as the registration IPnumber of daily visits IPnumber. Of course, there is an error! Redis official figures given are 0.81% turnover rate. Usage is also very simple as shown in FIG.

127.0.0.1:6379> pfadd ips:2018-10-04 "127.0.0.1" "127.0.0.2" "127.0.0.3" "127.0.0.4" 
(integer) 1
127.0.0.1:6379> pfcount ips:2018-10-04
(integer) 4
复制代码

The above is demonstrated, 2018-10-04 this day, about 4 ip landing system! Online space comparison chart and a collection of traditional structures, posted for everyone to see

Attention, Again, this structure is the existence of errors! For example, you pfaddthe one million data into account, the result of pfcountthe result of probably 999,756!

Geo

definition

Geo latitude and longitude can be used to store, calculate the distance between the two places, the range calculation. Its backing was zset.

use

Mainly in the following six groups of commands

  • geoadd: Increase the coordinates of a geographic location.
  • geopos: Get the coordinates of a geographic location.
  • geodist: Get the distance between two locations.
  • georadius: Obtain the geographic location within a specified range set according to a given geographic location coordinates.
  • georadiusbymember: Obtain the geographic location within a specified range set according to a given geographic location
  • geohash: Get geohash value of a geographic location.

I have here an example of a document directly attached to the official website, we are interested can check on their own. First of all, give the key to add two coordinates

redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2
复制代码

Next, for example between two coordinates calculated

redis> GEODIST Sicily Palermo Catania
"166274.15156960039"
复制代码

Finally, the latitude and longitude distance (15,37)from 100kmand 200kmwhat the range of coordinates

redis> GEORADIUS Sicily 15 37 100 km
1) "Catania"

redis> GEORADIUS Sicily 15 37 200 km
1) "Palermo"
2) "Catania"
复制代码

Guess you like

Origin juejin.im/post/5d47eb4f6fb9a06ade10f6a1