[Redis] Redis data structure

[Redis] Redis data structure

1. Dynamic String SDS

The key stored in Redis is a string, and the value is often a string or a collection of strings. Visible string is the most commonly used data structure in Redis.

Although Redis is written in C, why not use C strings directly? Because there are many problems with C language strings:

  1. Obtaining the length of the string requires operations (the actual length of the array = the length of the array - 1)
  2. Non-binary safety : \0it is the identifier of the end of the array, which means that we cannot store \0elements in the array, otherwise there will be problems when traversing the array
  3. Unchangeable

image-20230624113024458

Therefore, Redis has built a new string structure called Simple Dynamic String ( SDS for short) .

For example, if the command is executed set name 张三, Redis will create two SDSs at the bottom layer, one containing "name" and one containing "Zhang San".


Redis is implemented in C language, in which SDS is a structure, the source code is as follows:

image-20230624164559433

For example, an sds structure containing the string "name" is as follows:

image-20230624164700489

The reason why SDS is called a simple dynamic string is because it has the ability of dynamic expansion, for example, an SDS whose content is "hi":

image-20230624164815408

We want to add a string ", Amy" to this SDS, here we will first apply for a new memory space:

  • If the new string is less than 1M, the new space is twice the length of the extended string + 1
  • If the new string is larger than 1M, the new space is the length of the extended string+1M+1.

This behavior is called memory preallocation .

image-20230624165109650

Advantages of SDS over C language strings:

  1. The time complexity of getting the length of a string is O(1)
  2. Support dynamic expansion
  3. Reduce the number of memory allocations
  4. binary security

2. IntSet

InSet is an implementation of the set collection in Redis, which is implemented based on an integer array and has the characteristics of variable length and order.

The structure is as follows:

image-20230624180637342

The encoding contains three modes, indicating that the stored integer sizes are different:

image-20230624180802077

In order to facilitate the search, Redis will store all the integers in the intset in the contents array in ascending order, as shown in the following structure:

image-20230624181350976

Now, each number in the array is int16_twithin the range of , so the encoding used is INTSET_ENC_INT16that the byte size occupied by each part is:

  • encoding: 4 bytes
  • length: 4 bytes
  • contents: 2 bytes * 3 = 6 bytes

2.1 IntSet Upgrade

Suppose there is an intset, the elements are {5,10,20}, and the encoding adopted is INTSET_ENC_INT16, each integer occupies 2 bytes:

image-20230624184949852

We add a number to it: 50000, this number exceeds int16_tthe range of , intset will automatically upgrade the encoding method to a suitable size.

To illustrate the process with the current case:

  1. The upgrade encoding is that INTSET_ENC_INT32each integer occupies 4 bytes, and the array is expanded according to the new encoding method and the number of elements
  2. Copy the elements in the array to the correct position after expansion in reverse order
  3. Put the number of elements to be added ( 50000) at the end of the array
  4. Finally, change the encoding property of inset INTSET_ENC_INT32to 4 and change the length property to 4

image-20230624185501363

Inset can be regarded as a special integer array with some characteristics:

  1. Redis will ensure that the elements in Intset are unique and ordered
  2. With a type upgrade mechanism, it can save memory space
  3. The bottom layer uses a binary search method to query

3. Dict

Redis is a key-value database, we can quickly add, delete, modify and query according to the key. The mapping relationship between key and value is realized through Dict.

Dict consists of three parts, namely: hash table (DictHashTable) , hash node (DictEntry) , dictionary (Dict)

image-20230625114548062image-20230625114727544

image-20230625115344528

image-20230625163453423

When we add a key-value pair to Dict, Redis first calculates the hash value based on the key, and then uses to h & sizemaskcalculate which index position the element should be stored in the array.

Suppose the length of the hash table is 4, we store k1=v1, the hash value of k1 is h=1, then 1&3=1, so k1=v1 should be stored in the array subscript 1

image-20230625115126655

Store k2=v2 again, and the hash value h=1 of k2, then k2 should also be stored in 1 bit of the array subscript. Use head interpolation to avoid traversal.

image-20230625163603658


3.1 Dict expansion

The HashTable in Dict is the realization of an array combined with a one-way linked list. When there are many elements in the set, it will inevitably lead to more hash conflicts. If the linked list is too long, the query efficiency will be greatly reduced.

Dict will check the load factor (LoadFactor=user/size) every time a new key-value pair is added , and the expansion of the hash table will be triggered when the following two conditions are met :

  1. hash table LoadFactor>=1, and the server is not executing BGSAVEor BGREWRITEAOFwaiting for background processes
  2. hashtableLoadFactor>5

image-20230625165701915


3.2 Shrinkage of Dict

In addition to capacity expansion, Dict will also check the load factor every time an element is deleted. At that LoadFactor<0.1time , the hash table will be shrunk:

image-20230625165832961

image-20230625165840449image-20230625165845232


3.3 Rehash of Dict

Regardless of expansion or contraction, a new hash table must be created, resulting in changes in the size and sizemask of the hash table, and the query of the key is related to the sizemask. Therefore, the index must be recalculated for each key in the hash table, and a new hash table must be inserted. This process is called rehash.

The rehash of Dict is not completed at one time. If Dict contains millions of entries, rehash must be completed in one time, which is very likely to cause the main thread to block. Therefore, the rehash of Dict is completed in multiple times and gradually, so it is called 渐进式rehash. The process is as follows:

  1. Calculate the size of the new hash table, the value depends on whether the current expansion or contraction is to be done:
    • If it is expansion, the new size is the first 2^n greater than or equal to dict.ht[0].used + 1
    • If it is contraction, the new size is the first 2^n greater than or equal to dict.ht[0].used (not less than 4)
  2. Apply for memory space according to the new size, create dicttht, and assign it to dict.ht[1]
  3. Set dict.rehashindex=0 to start rehash
  4. Every time you perform addition, deletion, modification and query operations, check whether dict.rehashindex is greater than -1. If yes, hash the entry list of dict.ht[0].table[rehashindex] to dict.ht[1], and put rehashindex++, all data up to dict.ht[0] are rehash to dict.ht[1]
  5. Assign dict.ht[1] to dict.ht[0], initialize dict.ht[1] to an empty hash table, and release the memory of the original dict.ht[0]
  6. Assign the rehashindex to -1, which means the end of rehash
  7. In the rehash process, new operations are directly written to ht[1], and queries, modifications, and deletions are searched and executed in dict.ht[0] and dict.ht[1] in turn. This can ensure that the data of ht[0] will only decrease but not increase. As rehash proceeds, ht[0] will eventually be empty.

4. ZipList

ZipList is a special kind of "doubly-ended linked list", which consists of a series of specially encoded contiguous memory blocks. A push/pop operation can be done on either end, and the time complexity of this operation is O(1)

image-20230625183939725

Attributes type length use
zlbytes uint32_t 4 bytes Record the number of bytes of memory occupied by the entire compressed list
gilded uint32_t 4 bytes Record the number of bytes between the end node of the compressed list and the start address of the compressed list. Through this offset, the address of the end node can be determined.
zllen uint16_t 2 bytes Records the number of nodes contained in the compressed list. The maximum value is UINT16_MAX (65534). If it exceeds this value, it will be recorded as 65535 here, but the actual number of nodes needs to traverse the entire compressed list to be calculated.
entry list node indefinite Each node contained in the compressed list, the length of the node is determined by the content stored in the node.
zlend uint8_t 1 byte Special value 0xFF (255 decimal), used to mark the end of the compressed list.

4.1 Entry in ZipList

The Entry in ZipList does not record the pointers of front and back nodes like ordinary linked lists, because recording two pointers takes up 16 bytes, wasting memory. So use the following structure:

image-20230625184307623

  • previous_entry_length : Indicates the length of the previous node, accounting for 1 or 5 bytes
    • If the length of the previous node is less than 254 bytes, use 1 byte to save the length value
    • If the length of the previous node is greater than 254 bytes, use 5 bytes to save the length value, the first byte is 0xfe, and the last four bytes are the real length data
  • encoding : Encoding attribute, record the data type of content (string or integer) and length, occupying 1, 2 or 5 bytes
  • content : Responsible for saving the data of the node, it can be a string or an integer

Note: All values ​​of storage length in ZipList adopt little-endian byte order, that is, the low-order byte comes first, and the high-order byte follows. For example: the value 0x1234, the actual storage value after adopting little-endian byte order: 0x3412


4.1.1 Encoding encoding

The encoding in ZipListEntry is divided into two types: string and integer:

  • String: If the encoding starts with "00", "01", or "10", it proves that the content is a string
coding code length string size
|00pppppp| 1 bytes <= 63 bytes
|01pppppp|qqqqqqqq| 2 bytes <= 16383 bytes
|10000000|qqqqqqqqq|rrrrrrrr|sssssss|tttttttt| 5 bytes <= 4294967295 bytes

Suppose you save the string "ab", expressed in hexadecimal:

image-20230625192321159

Save the strings: "ab" and "bc", the complete representation is as follows:

image-20230625192400122


  • Integer: If the encoding starts with "11", it proves that the content is an integer, and the encoding only occupies 1 byte
coding code length integer type
11000000 1 int16_t(2 bytes)
11010000 1 int32_t(4 bytes)
11100000 1 int64_t(8 bytes)
11110000 1 24-bit signed integer (3 bytes)
11111110 1 8-bit signed integer (1 bytes)
1111xxxx 1 Save the value directly in the xxxx position, the range is from 0001 to 1101, and the result after subtracting 1 is the actual value

Example: A ZipList contains two integer values: "2" and "5"

image-20230625194435016


4.2 Chain update problem of ZipList

Each Entry of ZipList contains previous_entry_length to record the size of the previous node, and the length is 1 or 5 bytes:

  • If the length of the previous node is less than 254 bytes, use 1 byte to save the length value
  • If the length of the previous node is greater than or equal to 254 bytes, use 5 bytes to save the length value, and the first byte is 0xfe, and the last four bytes are the real length data

Now, suppose we have N consecutive entries with a length between 250 and 253 bytes, so the previous_entry_length attribute of the entry can be represented by one byte, as shown in the following figure:

image-20230626002225615

Suddenly want to insert an entry with a length of 254 bytes, and use the header insertion method to insert, then the previous_entry_length of the original first entry becomes 5, then the overall entry length changes from the original 250 to 254 bytes, then It also caused the previous_entry_length of the original second entry to become 5... Repeat this operation until the first entry with a length <250 is encountered.

image-20230626002529923

The continuous multiple space expansion operations generated in the special case of ZipList are called Cascade Update . New additions and deletions may lead to chain updates.


4.3 Features

Features of ZipList:

  1. The compressed list can be regarded as a "doubly linked list" of continuous memory space
  2. The nodes in the list are not connected by pointers, but are addressed by recording the length of the previous node and this node, and the memory usage is low
  3. If there is too much data in the list, the linked list will be too long, which may affect the query performance
  4. Adding or deleting large data may cause continuous update problems

5. QuickList

Question 1: Although ZipList saves memory, the application memory must be a continuous space. If the memory occupies a lot, the application memory efficiency is very low. what to do?

To alleviate this problem, we must limit the length of the ZipList and the size of the entry

Question 2: But what if we want to store a large amount of data, which exceeds the optimal upper limit of ZipList?

We can create multiple ZipLists to store data in fragments

Question 3: After the data is split, it is scattered and inconvenient to manage and search. How do these multiple ZipLists establish a relationship?

Redis introduced a new data structure QuickList in version 3.2 , which is a double-ended linked list, except that each node in the linked list is a ZipList.

image-20230626183649796

In order to avoid too many entries in each ZipList in QuickList, Redis provides a configuration item: list-max-ziplist-sizeto limit the number of entries.

  • If this value is positive, it represents the maximum number of entries allowed by ZipList
  • If this value is negative, it represents the maximum memory size of ZipList, divided into the following 5 situations:
    • -1: The memory usage of each ZipList cannot exceed 4kb
    • -2: The memory usage of each ZipList cannot exceed 8kb
    • -3: The memory usage of each ZipList cannot exceed 16kb
    • -4: The memory usage of each ZipList cannot exceed 32kb
    • -5: The memory usage of each ZipList cannot exceed 64kb

And list-max-ziplist-sizethe default value for is: -2

image-20230626184103853


In addition to controlling the size of ZipList, QuickList can also compress the ZipList of nodes. list-compress-depthControlled by configuration items . Because linked lists are generally accessed more from the beginning to the end, the beginning and the end are not compressed. This parameter is to control the number of nodes that are not compressed at the beginning and end:

  • 0: special value, means no compression
  • 1: Indicates that the first and last nodes of the QuickList are not compressed, and the middle nodes are compressed
  • 2: Indicates that the first and last two nodes of the QuickList are not compressed, and the middle nodes are compressed
  • and so on

list-compress-depthDefault value: 0

image-20230626184506692


The structure source code of QuickList and QuickListNode is as follows:

image-20230626184654677image-20230626184700182

The schematic diagram is as follows:

image-20230626184816710


5.1 Features

Features of QuickList:

  • Is a double-ended linked list whose node is ZipList
  • The node adopts ZipList, which solves the memory occupation problem of the traditional linked list
  • Control the size of ZipList and solve the problem of continuous memory space application efficiency
  • Intermediate nodes can be compressed, further saving memory

6. ShipList

SkipList (jump list) is a linked list first, but there are several differences compared with traditional linked lists:

  • elements are stored in ascending order
  • A node may contain multiple pointers with different pointer spans

image-20230629133051288

SkipList (jump list) is a linked list first, but there are several differences compared with traditional linked lists:

  • elements are stored in ascending order
  • A node may contain multiple pointers with different pointer spans

image-20230629133224437image-20230629133229174

The structure diagram is as follows:

image-20230629133458006


6.1 Features

Features of SkipList:

  • The jump list is a doubly linked list, each node contains score and ele (element) values
  • Nodes are sorted according to the score value, and if the score value is the same, they are sorted according to the ele dictionary
  • Each node can contain multiple layers of pointers, and the number of layers is a random number between 1 and 32
  • Different layer pointers have different spans to the next node, the higher the layer, the larger the span
  • The efficiency of adding, deleting, modifying and checking is the same as that of red-black tree, but the implementation is simpler.

7. RedisObject

The key and value of any data type in RedisObject will be encapsulated into a RedisObject, also called Redis object, the source code is as follows:

image-20230629134326599

Redis will choose different encoding methods according to the different types of stored data, including 11 different types in total:

serial number Encoding illustrate
0 OBJ_ENCODING_RAW raw encoding dynamic string
1 OBJ_ENCODING_INT a string of integers of type long
2 OBJ_ENCODING_HT hash table (dictionary dict)
3 OBJ_ENCODING_ZIPMAP obsolete
4 OBJ_ENCODING_LINKEDLIST double-ended linked list
5 OBJ_ENCODING_ZIPLIST compressed list
6 OBJ_ENCODING_INTSET set of integers
7 OBJ_ENCODING_SKIPLIST jump table
8 OBJ_ENCODING_EMBSTR dynamic string for embstr
9 OBJ_ENCODING_QUICKLIST quick list
10 OBJ_ENCODING_STREAM Stream

Redis will choose different encoding methods according to the type of data stored. The encoding used for each data type is as follows:

type of data Encoding
OBJ_STRING int、embstr、raw
OBJ_LIST LinkedList and ZipList (before 3.2), QuickList (after 3.2)
OBJ_SET intset、HT
OBJ_ZSET ZipList、HT、SkipList
OBJ_HASH ZipList、HT

8. Five basic data types

8.1 String

String is the most common data storage type in Redis, and it contains three encoding methods:

  1. RAW
  2. EMBSTR
  3. INT

  • RAW encoding: Based on Simple Dynamic String (SDS), the storage limit is 512mb. (Need to call the memory allocation function twice)

image-20230630170801089


  • EMBSTR encoding: If the length of the stored SDS is less than 44, it will be encoded EMBSTR. At this time, the object head and SDS are a pair of continuous spaces . When applying for memory, you only need to call the memory allocation function once, which is more efficient.

image-20230630170824685

为什么以44字节为界限?

当sds长度为44时,整个redis对象加起来一共是64字节。redis底层会以2的n次方去做内存分配,而64刚好是一个分片大小,所以不会产生内存碎片

所以,采用String类型时,sds长度最好不要超过44字节。


  • INT编码:如果存储的字符串是整数值,并且大小在LONG_MAX范围内,则会采用 INT 编码,直接将数据保存在RedisObject的ptr指针位置(指针刚好是8字节),不需要SDS了。

image-20230630171930604


8.2 List

Redis的List结构类似于一个双端链表,可以从首、尾操作列表中的元素:

  • 在3.2版本之前,Redis采用ZipList和LinkedList来实现List,当元素数量小于512并且元素大小小于64字节时采用ZipList编码,超过则采用LinkedList编码。
  • 在3.2版本之后,Redis统一采用QuickList来实现List。

image-20230630201319351image-20230630201324505

image-20230630201339491


8.3 Set

Set是Redis中的单列集合,满足以下特点:

  1. 不保证有序性
  2. 保证元素唯一(可以判断元素是否存在)
  3. 求交集,并集,差集

可以看出,Set对查询元素的效率要求非常高,在Redis中 Dict 数据结构可以满足,不过Dict是双列集合(可以存键又可以存值)。

Set是redis中的集合,不一定确保元素有序,可以满足元素唯一,查询效率要求极高。Set有以下两种编码方式:

  • HT编码:为了查询效率和唯一性,Set采用 HT编码(Dict),Dict中的key用来存储元素,value统一为null。
  • IntSet编码:当存储的所有数据都是整数,并且元素数量不超过 set-max-intset-entries 时,Set会采用IntSet编码以节省内存。(set-max-intset-entries的默认值是512)

image-20230701004445741

每次往Set集合中添加元素,就会判断元素的编码,如果原来一直都是IntSet编码,但是这个添加的元素不是整数,就要将全部的元素都转成HT编码。

image-20230701004754380

流程如下:

①:当前Set集合的编码是Intset。

image-20230701004911481

②:此时往Set集合中添加元素m1,sadd s1 m1 ,由于m1不是整数,所以Set集合的编码方式就改变了。

image-20230701005110450

8.4 ZSet

ZSet也就是SortedSet,其中么一个元素都需要制定一个score值和member值:

  • 可以根据score值排序
  • member必须唯一
  • 可以根据member查询socre

因此,zset底层数据结构必须满足键值存储、键必须唯一、可排序这几个需求。哪些编码结构可以满足?

  • SkipList:可以排序,并且可以同时存储score和ele值(member)
  • HT(Dict):可以键值存储,并且可以根据key找value

image-20230703172422089image-20230703172428388

image-20230703172519972

当元素数量不多时,HT和SkipList的优势不明显,而且更耗内存。因此zset还会采用ZipList结构来节省内存,不过需要同时满足两个条件:

  1. 元素数量小于 zset_max_ziplist_entries ,默认值128个
  2. 每个元素都小于 zset_max_ziplist_value ,默认值64字节

image-20230703175240702

image-20230703175330381

由于ZipList本身没有排序功能,而且没有键值对的概念,所以需要有zset通过编码实现:

  • ZipList是连续内存,因此score和element是紧挨在一起的两个entry,element在前,score灾后
  • score越小越接近队首,score越大越接近队尾,按照score值升序排列

image-20230703175638339


8.5 Hash

Hash结构与Redis中的Zset非常类似:

  • 都是键值存储
  • 都需要根据键获取值
  • 键必须唯一

区别如下:

  • zset的键是member,值是score,hash的键和值都是任意值
  • zset要根据score排序,hash则无需排序

Therefore, the encoding adopted by the bottom layer of Hash is basically the same as that of Zset, and it is only necessary to remove the SkipList related to sorting:

  • The Hash structure uses ZipList encoding by default to save memory. Two adjacent entries in ZipList save field and value respectively
  • When the amount of data is large, the Hash structure will be converted to HT code, that is, Dict. There are two triggering conditions:
    1. The number of elements in the ZipList exceeds hash-max-ziplist-entries(default 512)
    2. Any entry size in ZipList exceeds hash-max-ziplist-value(default 64 bytes)

image-20230703193544826

When the above two conditions are not met, the Hash structure is converted to HT encoding:

image-20230703193713050


Guess you like

Origin blog.csdn.net/Decade_Faiz/article/details/131388503