[Turn] availability Redis (six): the Swiss Army knife of the bitmap, HyperLoglog and GEO

1.bitmap Bitmap

The concept 1.1 bitmap bitmap

First, look at an example, Big string,

字母b的ASCII码为98,转换成二进制为 01100010
字母i的ASCII码为105,转换成二进制为 01101001
字母g的ASCII码为103,转换成二进制为 01100111

If Redis, set a key, its value is Big, at this time 可以get到big这个值, may be获取到 big的ASCII码每一个位对应的值,也就是0或1

E.g:

127.0.0.1:6379> set hello big OK 127.0.0.1:6379> getbit hello 0 # b的二进制形式的第1位,即为0 (integer) 0 127.0.0.1:6379> getbit hello 1 # b的二进制形式的第2位,即为1 (integer) 1

big length of 3 bytes, a length corresponding to 24 bits,使用getbit命令可以获取到big对应的位的对应的值

Redis it can operate directly on the bit

1.2 bitmap common commands

1.2.1 setbit command

setbit key offset vlaue         给位图指定索引设置值

example:

127.0.0.1:6379> set hello big       # 设置键值对,key为'hello',value为'big'
OK
127.0.0.1:6379> setbit hello 7 1 # 把hello二进制形式的第8位设置为1,之前的ASCII码为98,现在改为99,即把b改为c (integer) 0 # 返回的是之前这个位上的值 127.0.0.1:6379> get hello # 修改之后,获取'hello'的值,为'cig' "cig"

big length with only 24, if used setbit command, specifying the target bit greater than the length of time

127.0.0.1:6379> setbit hello 50 1 (integer) 0 127.0.0.1:6379> get hello "cig\x00\x00\x00 "

25 starts from position 49 to the intermediate filled with 0, bit 50 will be set to 1

1.2.2 getbit command

getbit key offset           获取位图指定索引的值

example:

127.0.0.1:6379> getbit hello 25 (integer) 0 127.0.0.1:6379> getbit hello 49 (integer) 0 127.0.0.1:6379> getbit hello 50 (integer) 1

1.2.3 bitcount command

bitcount key [start end]        获取位图指定范围(start到end,单位为字节,如果不指定就是获取全部)位值为1的个数

example:

127.0.0.1:6379> bitcount hello (integer) 14 127.0.0.1:6379> bitcount hello 0 23 (integer) 14

1.2.4 bitop command

bitop op dtstkey key [key...]       做多个bitmap的and(交集),or(并集),not(非),xor(异或)操作并将结果保存在destkey中 bitpos key targetBit [start] [end] 计算位图指定范围(start到end,单位为字节,如果不指定就是获取全部)第一个偏移量对应的值等于targetBit的位置 

1.3 bitmap applied to the bitmap

If a site has 100 million users, if user_id integer with a length of 32, 50 million daily unique visitors, how to determine which 50 million users visit the website

1.3.1 Method 1: Use set to save

Use set to run one day need to save data to the memory occupied

32bit * 50000000 = (4 * 50000000) / 1024 /1024 MB,约为200MB

Run a month for the amount of memory required 6G, as the amount of memory to run a year 72G

30 * 200 = 6G

1.3.2 Second way: using bitmap way

If user_id visit the Web site, set on the user_id index is 1, user_id no access to the site, whose index is set to 0, this way memory is occupied day running

1 * 100000000 = 100000000 / 1014 /1024/ 8MB,约为12.5MB

Run for a month occupied memory is 375MB, year occupation of the memory capacity of 4.5G

Thus, the use of bitmap can save a lot of memory resources

1.4 bitmap using the experience

bitmap是string类型,单个值最大可以使用的内存容量为512MB
setbit时是设置每个value的偏移量,可以有较大耗时
bitmap不是绝对好,用在合适的场景最好

2.HyperLoglog

About 2.1 HyperLoglog

Based HyperLogLog algorithm, a very small amount of space to complete independent statistics

Wikipedia Address

2.2 Common Commands

pfadd key element [element...]                  向hyperloglog添加元素 pfcount key [key...] 计算hyperloglog的独立总数 prmerge destkey sourcekey [sourcekey...] 合并多个hyperloglog

example:

127.0.0.1:6379> pfadd unique_ids1 'uuid_1' 'uuid_2' 'uuid_3' 'uuid_4' # 向unique_ids1中添加4个元素 (integer) 1 127.0.0.1:6379> pfcount unique_ids1 # 查看unique_ids1中元素的个数 (integer) 4 127.0.0.1:6379> pfadd unique_ids1 'uuid_1' 'uuid_2' 'uuid_3' 'uuid_10' # 再次向unique_ids1中添加4个元素 (integer) 1 127.0.0.1:6379> pfcount unique_ids1 # 由于两次添加的value有重复,所以unique_ids1中只有5个元素 (integer) 5 127.0.0.1:6379> pfadd unique_ids2 'uuid_1' 'uuid_2' 'uuid_3' 'uuid_4' # 向unique_ids2中添加4个元素 (integer) 1 127.0.0.1:6379> pfcount unique_ids2 # 查看unique_ids2中元素的个数 (integer) 4 127.0.0.1:6379> pfadd unique_ids2 'uuid_4' 'uuid_5' 'uuid_6' 'uuid_7' # 再次向unique_ids2中添加4个元素 (integer) 1 127.0.0.1:6379> pfcount unique_ids2 # 再次查看unique_ids2中元素的个数,由于两次添加的元素中有一个重复,所以有7个元素 (integer) 7 127.0.0.1:6379> pfmerge unique_ids1 unique_ids2 # 合并unique_ids1和unique_ids2 OK 127.0.0.1:6379> pfcount unique_ids1 # unique_ids1和unique_ids2中有重复元素,所以合并后的hyperloglog中只有8个元素 (integer) 8

2.3 HyperLoglog memory consumption (million unique users)

example:

127.0.0.1:6379> flushall # 清空Redis中所有的key和value OK 127.0.0.1:6379> info # 查看Redis占用的内存量 ...省略 # Memory used_memory:833528 used_memory_human:813.99K # 此时Redis中没有任何键值对,占用814k内存 used_memory_rss:5926912 used_memory_rss_human:5.65M used_memory_peak:924056 used_memory_peak_human:902.40K total_system_memory:1023938560 total_system_memory_human:976.50M used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction mem_fragmentation_ratio:7.11 mem_allocator:jemalloc-3.6.0 ...省略

Run python code as follows:

import redis
import time

client = redis.StrictRedis(host='192.168.81.101',port=6379) key = 'unique' start_time = time.time() for i in range(1000000): client.pfadd(key,i)

Waiting for the completion of python code runs, view the amount of memory Redis occupied again

127.0.0.1:6379> info ...省略 # Memory used_memory:849992 used_memory_human:830.07K used_memory_rss:5939200 used_memory_rss_human:5.66M used_memory_peak:924056 used_memory_peak_human:902.40K total_system_memory:1023938560 total_system_memory_human:976.50M used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction mem_fragmentation_ratio:6.99 mem_allocator:jemalloc-3.6.0 ...省略

It can be seen using hyperloglog stored data to redis 1 million, the amount of memory required for the

830.07K - 813.99K约为16k

Occupy very little memory.

Of course there is no free lunch, hyperloglog there are very clear limitations

首先,hyperloglog有一定的错误率,在使用hyperloglog进行数据统计的过程中,hyperloglog给出的数据不一定是对的
按照维基百科的说法,使用hyperloglog处理10亿条数据,占用1.5Kb内存时,错误率为2%
其次,没法从hyperloglog中取出单条数据,这很容易理解,使用16KB的内存保存100万条数据,此时还想把100万条数据取出来,显然是不可能的

2.4 HyperLoglog Notes

When using hyperloglog statistical data, we need to consider three factors:

1.是否需要很少的内存去解决问题,
2.是否能容忍错误
3.是否需要单条数据

3.GEO

3.1 GEO Profile

GEO i.e. address information location
may be used to store the latitude and longitude, the two calculated distance, range calculation

In above figure, calculates the distance between the two Beijing to Tianjin

3.2 GEO common commands

3.2.1 geoadd command

geoadd key longitude latitude member [longitude latitude member...]     增加地理位置信息

As FIG. 5 is a latitude and longitude data cities

127.0.0.1:6379> geoadd cities:locations 116.28 39.55 beijing # 添加北京的经纬度 (integer) 1 127.0.0.1:6379> geoadd cities:locations 117.12 39.08 tianjin 114.29 38.02 shijiazhuang # 添加天津和石家庄的经纬度 (integer) 2 127.0.0.1:6379> geoadd cities:locations 118.01 39.38 tangshan 115.29 38.51 baoding # 添加唐山和保定的经纬度 (integer) 2

3.2.2 geppos command

geopos key member [member...]       获取地理位置信息

example:

127.0.0.1:6379> geopos cities:locations tianjin # 获取天津的地址位置信息 1) 1) "117.12000042200088501" 2) "39.0800000535766543"

3.2.3 geodist command

geodist key member1 member2 [unit] 获取两个地理位置的距离,unit:m(米),km(千米),mi(英里),ft(尺)

example:

127.0.0.1:6379> geodist cities:locations tianjin beijing km "89.2061" 127.0.0.1:6379> geodist cities:locations tianjin baoding km "170.8360"

3.2.4 georadius command and command georadiusbymember

georedius key longitude latitude radiusm|km|ft|mi [withcoord] [withdist] [withhash] [COUNT count] [asc|desc] [store key][storedist key] georadiusbymember key member radiusm|km|ft|mi [withcoord] [withdist] [withhash] [COUNT count] [asc|desc] [store key][storedist key] 获取指定位置范围内的地理位置信息集合  withcoord:返回结果中包含经纬度  withdist:返回结果中包含距离中心节点位置  withhash:返回结果中包含geohash  COUNT count:指定返回结果的数量  asc|desc:返回结果按照距离中心节点的距离做升序或者降序  store key:将返回结果的地理位置信息保存到指定键  storedist key:将返回结果距离中心节点的距离保存到指定键

example:

127.0.0.1:6379> georadiusbymember cities:locations beijing 150 km   # 获取距离北京150km范围内的城市
1) "beijing" 2) "tianjin" 3) "tangshan" 4) "baoding"

3.3 GEO instructions

Redis的GEO功能是从3.2版本添加
geo功能基于zset实现 geo没有删除命令

3.3.1 Use of geo zrem command to delete

command:

zrem key member

example:

127.0.0.1:6379> georadiusbymember cities:locations beijing 150 km 1) "beijing" 2) "tianjin" 3) "tangshan" 4) "baoding" 127.0.0.1:6379> zrem cities:locations baoding (integer) 1 127.0.0.1:6379> georadiusbymember cities:locations beijing 150 km 1) "beijing" 2) "tianjin" 3) "tangshan"

3.4 GEO application scenarios

微信摇一摇

Guess you like

Origin www.cnblogs.com/oxspirt/p/11006709.html