[Redis 03] Redis commonly used data object types

There are 8 common objects in Redis:

  • String object-string
  • List object-list
  • Hash object-hash
  • Collection object-set
  • Ordered collection object-zset
  • Bitmap
  • Geo
  • Hyperloglog

0, Redis object introduction

Key values ​​in Redis are represented by objects. When creating a new key value, at least two objects are created each time: key object and value object.

The object redisObject in Redis is defined as follows:

typedef struct redisObject {
    unsigned type:4; // 对象类型, string、list、hash、set、zset
    unsigned encoding:4; //  编码方式
    unsigned lru:LRU_BITS; 
    int refcount; //引用次数
    void *ptr; // 数据指针
} robj;
  • Type type: Use TYPE key, will output respectively: string, list, hash, set, zset, corresponding to 5 types of objects.
  • Encoding: string (int, embstr, raw), dictionary (hashtable), double-ended linked list (linkedlist), compressed list (ziplist), integer set (intset), skip list and dictionary (skiplist).

One, string object (stirng)

The encoding of the string object can be int, raw, embstr 3 kinds.

1. int: When the string object saves the integer value, the int encoding method is used, and the value will be saved in the ptr attribute (the space occupied by the ptr pointer is saved):
Insert picture description here

2. Raw: When the string object saves the string value, and the string is> 39 bytes, use SDS simple dynamic string to save, and the encoding method is raw:
Insert picture description here

3. embstr: used to save short strings (<=39 bytes).

The difference with raw encoding: raw encoding calls the memory allocation function twice to create redisObject and sds. Embstr only needs to call the memory allocation function once to allocate a contiguous space (reduce memory fragmentation).
Insert picture description here

Common operation commands:

set key value
get key
mset key1 value1 keys2 value2
mget key1 key2...
incrby key integer :将key的值 + integer值
dncrby key integer :将key的值 - integer值  

Second, the list object (list)

The encoding of the list object is ziplist or linkedlist.

When the list object stores string elements, and:

  • String length <64 bytes
  • Number of strings <512

Both must be met before using ziplist, otherwise linkedlist is used.

1, ziplist (compressed list)

RPUSH numbers 1 "three" 5
Insert picture description here

2. Linkedlist (double-ended linked list)
Insert picture description here

Common commands for list objects:

l - left  r - right
lpush key value1 [value2]
rpush key value1 [value2]

Third, the hash object (hash)

The hash object encoding can be ziplist or hashtable.

When the length of the key-value pairs stored in the hash object is less than 64 bytes; the number of key-value pairs is less than 512, use ziplist; otherwise, hashtable will be used.

1 ip ziplist

When using a compressed list to save hash objects, save the key object first, and then save the value object. The newly added element is placed at the end of the table.
Insert picture description here

If the set key is player, the key and value in the set value can be basketball-James; baskterball-Jordan; football-belly

即:hset player baskterball James

​ hset player baskterball Jordan

	hset player football  belly

=> hset key k v

2、hashtable

The key values ​​are all saved with string objects:
Insert picture description here

Commonly used commands:

存储:hset book name "Java"
			hset book price "100"
获取:hget book name ==> Java
     hget book price ==> 100

Fourth, the collection object (set)

Collection object encoding uses intset and hashtable.

When the elements saved by the collection object are all integer values ​​&& The number of elements does not exceed 512, intset is used, otherwise hashtable is used.

1. intset: When storing integers, use this encoding method;
Insert picture description here

2. Hashtable: Only use dictionary keys to save, each key is a string object, and the value is set to null.

Commonly used commands:

sadd numbers 1 2 3
获取集合所有元素:smembers numbers
删除元素:srem numbers 2
获取集合个数:scard numbers 

Five, ordered set of objects (zset)

Ordered set encoding uses ziplist and skiplist.

Number of elements <128 && When the length is less than 64 bytes, skiplist encoding is used.

1 ip ziplist

The compressed list saves element values ​​and scores together. The elements in the compressed list are sorted from smallest to largest score, and elements with lower scores are placed near the head of the table.

For example:, the zadd price 8.5 apple 7.2 banana 2 cherrykey is price, and the value is [8.5-apple], [7.2-banana], [2-chery].
Insert picture description here

2 、 ship list

zset also uses a dictionary + jump table to implement an ordered set: zsl + dict.

(1) zsl: The object attribute of the jump table node saves the element, and the score saves the score. The range query can be performed on the ordered set through the jump table, which is convenient for range search.

(2) dict: Save the mapping between the object and the score, purpose: easy to find, find O(1)
Insert picture description here

Note: Why does an ordered set use both a skip list and a dictionary at the same time?

=> If you only use zsl, the time complexity will change from O(1) to O(logn) when looking up a member. If you only use dict, you need to traverse all element scores sorting first when performing range search. The best time complexity is O(nlogn), and additional space complexity O(k) is required for storage.

Special object type

  • BitMap
  • Geo
  • HyperLogLog

1. BitMap - saves space and is suitable for two-value status statistics, that is, 0 or 1 statistics such as sign in, either signed or not signed in

The bitmap is not a real data type, but a collection of bit-oriented operations defined on the string type. Since the string type is a binary safe binary large object, and the maximum length is 512MB, it is suitable for setting 2^32 different bits.

Bitmap operation is used to manipulate bits, and its advantage is to save memory space . Why can memory space be saved? If we need to store the login status of 1 million users, at least 1 million bits (bit 1 means login, bit 0 means not logged in) can be stored if using bitmaps, and if it is in the form of a string For storage, for example, if userId is the key and whether to log in (the string "1" means login, the string "0" means not logged in) is stored as the value, then 1 million strings need to be stored. In contrast, use Bitmap storage occupies much smaller space, which is the advantage of bitmap storage.

The bottom layer of BitMap is implemented by operating on strings. The String type is a byte array that will be saved as a binary. Therefore, Redis uses each bit of the byte array to represent the binary state of an element. You can think of Bitmap as an array of bits.

grammar:SETBIT key offset value

For example, count the number of days a user checks in a year. The key is the user id, and the value is a 365-bit Bitmap. Each bit corresponds to the user’s check-in status on the day, 1- represents checked-in, 11000011 represents the user number, 0 1 2…360 generation offsets, which represent the first and the first. 2, 3, 360 days.

For example, the first day:setbit user:sign:11000011:2020 0 1

Day 2: setbit user:sign:11000011:2020 1 1,

Day 3: setbit,

Finally, count the number of check-ins of the user in a year:

BITCOUNT uid:sign:11000011:2020

For another example, count the sign-in status of 100 million users in October. The key is the date of each day, with 31 keys; the value is a Bitmap with 100 million bits, and 1 represents the sign-in.

10.01:

// 3个用户在10月1号的签到状态
setbit user:sign:20201001 user01  1
setbit user:sign:20201001 user02  0
setbit user:sign:20201001 user03  1

10.02:

// 3个用户在10月2号的签到状态
setbit user:sign:20201002 user01  1
setbit user:sign:20201002 user02  0
setbit user:sign:20201003 user03  1

Count the AND operations on 31 Bitmaps to get the final Bitmap, and get the result through bitcount.

BITCOUNT uid:sign:20201001 
+
BITCOUNT uid:sign:20201002
+
BITCOUNT uid:sign:20201003

The memory occupied by a 100 million Bitmap is: 10^8 bit = 10 ^8 / 8 = 1.25 * 10^7 bytes = 1.2207 * 10^4 KB = 12 MB

https://mp.weixin.qq.com/s?spm=a2c6h.12873639.0.0.5c245c15Ac1CVJ&__biz=MzAxNjM2MTk0Ng==&mid=2247484427&idx=1&sn=cb810acc286b9f85796ef4dc35587309&chksm=9bf4b4beac833da86eff09b2f68195930e5fd1c374448c7b22b16725a4ec717dacac63ba1da7&scene=21#wechat_redirect - How to use the bitmap Redis operation gracefully

2、HyperLogLog

HyperLogLog is a data collection type used to calculate cardinality. Its biggest advantage is that when the number of collection elements is very large, the space it needs to calculate cardinality is always fixed and very small.

In Redis, each HyperLogLog only needs 12 KB of memory to calculate the base of close to 2^64 elements. You see, HyperLogLog is very space-saving compared to the Set and Hash types, which consume more memory with more elements.

For example, if we count the number of visitors to a web page, we can use the Set type to remove the duplicate statistics. But as the more elements, the larger the memory occupied, you can use HyperLogLog for statistics:

PFADD page1:uv user1 user2 user3 user4 user5,Inquire:PFCOUNT page1:uv

Implementation principle: HyperLogLog algorithm , https://blog.csdn.net/u011489043/article/details/78727128

The general idea is: count the number of non-repeated elements in a set of data, and each element in the set can be expressed as a binary number string composed of 0 and 1 after passing through the hash function. The number of occurrences of 0 and 1 in a binary string is a bit like a coin flip, and the whole is estimated through a locally different array. There will be an error, around 0.8.

3、Geo

The bottom layer of Geo is implemented based on Sorted Set. The GEO type uses the interval code of the latitude and longitude as the weight score of the elements in the Sorted Set.

How to code?

Using GeoHash, the method of "interval coding between two partitions", the longitude and latitude are respectively coded for a set of latitude and longitude, and finally a final code is synthesized.

The latitude and longitude interval is continuously divided into two, the left interval is set to 0, and the right interval is set to 1.
Insert picture description here

The final coding rule is: the even digits of the final coding value are followed by the longitude coding value, and the odd digits are followed by the latitude coding value. Among them, the even digits start from 0 and the odd digits start from 1.

1110011101。

Summary, data structure used by several objects
Insert picture description here

Several object application scenarios

  1. string: The most common data type, common KV storage.

    For example, to store user information, the key is the user id, and the value can be stored in json format: set userid {"name":"aaa","age":"123"}

    Unique id generation: Initialize first:, set uniqueid 1each time it gets executed, the incr uniqueidincremented id value is returned, and single-thread processing does not need to consider concurrency;

  2. list:

    Retry the queue, add tasks through the list; constantly check the list for tasks to be executed while executing tasks.

  3. set:

    Mainly to remove duplication. Some scenes that need to be deduplicated can use set, such as statistics and filtering functions. You can use set for small data volume statistics.

  4. zset :

    Leaderboard. Such as movie rankings:zadd movie_rank_board 1 功夫 2 逃学威龙 3 大话西游

  5. hash:

    In the spike scenario, the purchase of different products from different merchants is limited to xx pieces. You can use the merchant id as the key; the id of the product participating in the snap-up purchase is the field, and the corresponding snap-up quantity is stored as the value. Such as:, When hmset 781287 product-001 100 product-002 50rushing to buy, perform the deduction operation:

Memory reclamation

Redis adds a reference counter to the object to implement a memory recovery mechanism:

  • When creating an object, the reference counter refCount = 1;
  • When the object is referenced, refCount++;
  • After use, refCount–;
  • It will be released when refCount = 0.

Value object sharing-only object strings with integer values ​​will be shared

When key A creates a string object with an integer value of 100 as a value object, A <===> 100; when key B also needs to create a string object with a value of 100 as a value object, Redis will point keys A and B to The same string object is 100, and at the same time the refCount++ of this string object is 100 (=3, initially 1, which is referenced by the server).
Insert picture description here

Note: Why not share objects of string arrays?

==> When Redis wants to set a shared object as the value object of a certain key, it needs to check whether the shared object is what it wants.

Therefore, the value saved by this shared object cannot be too complicated, otherwise the time complexity during verification will be too high. Currently, Redis only shares string objects containing integer values, and the verification time complexity is O(1).


reference:

"Redis Design and Implementation"

"Redis Core Technology and Actual Combat"

Guess you like

Origin blog.csdn.net/noaman_wgs/article/details/114989655