introduction
Speaking of redis
data structures, you may be familiar with the five basic data types: String
, Hash
, List
, Set
, Sorted Set
. Then addition, there are three major derived data structure, it usually is little contact, namely: bitmaps
, hyperloglog
, geo
Also, I think, these three data structures, can only be icing on the cake. In a real project, I really have not used. Here we look at these three data structure definitions and uses
bitmaps
definition
Speaking of this bitmaps
, in fact it is String
, but it can String
bit operation. And then, this bit operations, has its own command, and so the operation String
of the redis
command not much different! It can be understood
bitmaps in a bit unit of the array, each element of the array can store 0 and 1
Here's an example, such as we have to do a set operation, key
for the w
, value
for the h
, then execute the following command
127.0.0.1:6379> set w h
OK
127.0.0.1:6379> get w
"h"
复制代码
So h
the ASCII to 0110 1000
the next, you can use the command-bit getbit
instruction fetch, remove the contents of each bit.
127.0.0.1:6379> getbit w 0 #用getbit获取w第0位的值
(integer) 0
127.0.0.1:6379> getbit w 1 #用getbit获取w第1位的值
(integer) 1
127.0.0.1:6379> getbit w 2 #用getbit获取w第2位的值
(integer) 1
127.0.0.1:6379> getbit w 3 #用getbit获取w第3位的值
(integer) 0
复制代码
use
Internet rumors, this structure is used to statistics, the number of active users in a certain period of time, the use bitmap
of the structure than the traditional set
structure of space-saving. However, this use has many limitations, I will talk later. Let me talk about the online statement. Suppose there are 30 users, of which there are five users, in 2018-10-04
this day landed. It is assumed that five users userid = 2,4,8,11,12. So, we assume that the key is users:2018-10-04
, its value
value is used to record user login information. Then in order to record the five users have logged on, we have the value
second bit value, bit 4, No. 8, No. 11, 12th bit set to 1, i.e., by executing the command
127.0.0.1:6379> setbit users:2018-10-04 2 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 4 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 8 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 11 1
(integer) 0
127.0.0.1:6379> setbit users:2018-10-04 12 1
(integer) 0
复制代码
This time, for example, you want to determine the user userid = 11, in 2018-10-04 this day, have not logged in before, then execute the following command
127.0.0.1:6379> getbit users:2018-10-04 11
(integer) 1
复制代码
The result is 1, it means that the user landed. If the return result is 0, it means the user does not have landed. If you want to statistics, 2018-10-04, the day the number of users logged in, you can execute the following command
127.0.0.1:6379> bitcount users:2018-10-04
(integer) 5
复制代码
The above command on statistics, 2018-10-04, the day 5 users have logged on. ok, here we can see much on the investigation. Let me talk about, here userid=2,4,8,11,12
, can be understood as an offset. For example, userid bit actual project 1000002, then the offset is two. Everyone in the project, it can be flexible. However, there is a limitation in this way. We in the actual project, if you use uuid userid is generated, then, how do you generate offset these userid? Could you have to find a hash function to generate the offset? Even if found appropriate hash function, you can ensure a certain hash collision does not occur, resulting in consistent offset? So, we can understand.
HyperLogLog
definition
HyperLogLog
Not a data structure, but an algorithm, you can use very little memory space to complete the total number of statistically independent . In fact, it may be unfamiliar to the algorithm. We java
have a library call stream-lib
, which also implements HyperLogLog
the algorithm. I say something about the algorithm of principle , I do not want to move out to a lengthy mathematical papers, we also looked bored, here Hyper
it refers to the super mean, it's past life is LogLog algorithm. Here I touched the installed 13 bit, we can comprehend the essence can be. Consider the following dialogue
Me: "ditty ah, ah suppose, I throw a coin five times, then lost a lot of rounds, found that the rounds, the most frequent continuous 2 1 back positive, you can guess how many rounds I lost Mody ! "ditty:" it should be no rounds, at most, seventy-eight "me:." FML, so witty, how counted "ditty:"? very simple ah, positive and negative probability is 1/2, even the second negative, not a positive one is 1/2 1/2 1/2 what "me:"! what if a maximum of 4 consecutive times appear back once the front of it? "ditty:" it should be a lot of it round ! "me:" Sure enough wit '!
And the chat, between me and my colleagues from the music, daily mutual blow! Any similarity is purely coincidental!
Well, the principle finished! But his estimation algorithm is more complicated! Not as simple as that! And so estimation error is relatively large! Algorithm The following pseudo-code is given.
输入:一个集合
输出:集合的独立总数
算法:
max = 0
对于集合中的每个元素:
hashCode = hash(元素)
num = hashCode二进制表示中最前面连续的0的数量
if num > max:
max = num
最后的结果是2的(max + 1)次幂
复制代码
It should be noted
hashCode = hash(元素)
复制代码
Is your input elements, the binary is mapped into a long string. After a long string of binary mapping, it can be compared to the results of the first to say I'm a coin flip. As for the final result Why (max+1)
, you can go to check documents. After all, this article is talking about redis
, not talking about this algorithm. Moreover this algorithm, also after a series of later evolution, such as the set of parameters into the m portions, and the m
result of seeking a partial average (avg)
, 2 to the final (avg + 1)
power to estimate the total number of independent! These readers are interested can check on their own!
use
This structure can be very provincial statistics to various counts of memory, such as the registration IP
number of daily visits IP
number. Of course, there is an error! Redis official figures given are 0.81% turnover rate. Usage is also very simple as shown in FIG.
127.0.0.1:6379> pfadd ips:2018-10-04 "127.0.0.1" "127.0.0.2" "127.0.0.3" "127.0.0.4"
(integer) 1
127.0.0.1:6379> pfcount ips:2018-10-04
(integer) 4
复制代码
The above is demonstrated, 2018-10-04 this day, about 4 ip landing system! Online space comparison chart and a collection of traditional structures, posted for everyone to see
Attention, Again, this structure is the existence of errors! For example, you pfadd
the one million data into account, the result of pfcount
the result of probably 999,756!
Geo
definition
Geo latitude and longitude can be used to store, calculate the distance between the two places, the range calculation. Its backing was zset.
use
Mainly in the following six groups of commands
geoadd
: Increase the coordinates of a geographic location.geopos
: Get the coordinates of a geographic location.geodist
: Get the distance between two locations.georadius
: Obtain the geographic location within a specified range set according to a given geographic location coordinates.georadiusbymember
: Obtain the geographic location within a specified range set according to a given geographic locationgeohash
: Get geohash value of a geographic location.
I have here an example of a document directly attached to the official website, we are interested can check on their own. First of all, give the key to add two coordinates
redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2
复制代码
Next, for example between two coordinates calculated
redis> GEODIST Sicily Palermo Catania
"166274.15156960039"
复制代码
Finally, the latitude and longitude distance (15,37)
from 100km
and 200km
what the range of coordinates
redis> GEORADIUS Sicily 15 37 100 km
1) "Catania"
redis> GEORADIUS Sicily 15 37 200 km
1) "Palermo"
2) "Catania"
复制代码