Analysis of the three underlying data structures of Redis List

1. What is Redis List

As a Java developer, you are no stranger to seeing this term. This data structure is used almost every day in Java development.

Redis's List is similar to LinkedList in Java. It is a linear ordered structure, which can store elements in the order in which they are pushed into the list, and can meet the first-in-first-out requirements. These elements can be text data and Can be binary data.

You can use it as a queue or a stack.

2. Practice mental method

My name is Redis. In the C language, there is no ready-made linked list structure, so antirez specially designed a set of implementation methods for me.

Regarding the underlying data structure of the List type, it can be said that heroes have emerged in large numbers. The antirez boss has been optimizing and creating a variety of data structures for storage.

From the beginning, the early version used linkedlist (double-ended list) and ziplist (compressed list) as the underlying implementation of List, to Redis 3.2 introduced quicklist composed of linkedlist + ziplist, and then used listpack to replace ziplist in version 7.0 .

MySQL: "Why do you have so many data structures?"

All that antirez does is to make a trade-off and balance between memory space overhead and access performance. Follow me to understand the design ideas and shortcomings of each type, and you will understand.

linkedlist (double-ended list)

Before Redis version 3.2, the underlying data structure of List was implemented by linkedlist or ziplist, and ziplist was preferred for storage.

When the list object meets the following two conditions, the List will be stored in ziplist, otherwise it will be stored in linkedlist.

  • Each element of the List occupies less than 64 bytes.

  • List has fewer than 512 elements.

The nodes of the linked list are represented by adlist.h/listNodethe structure .

typedef struct listNode {
    // 前驱节点
    struct listNode *prev;
    // 后驱节点
    struct listNode *next;
    // 指向节点的值
    void *value;
} listNode;

listNodeA double-ended linked list is formed through the prev and next pointers. In addition, I also provide adlist.h/lista structure that provides a head pointer head, a tail pointer tail, and some specific functions to achieve polymorphism.

typedef struct list {
    // 头指针
    listNode *head;
    // 尾指针
    listNode *tail;
    // 节点值的复制函数
    void *(*dup)(void *ptr);
    // 节点值释放函数
    void (*free)(void *ptr);
    // 节点值比对是否相等
    int (*match)(void *ptr, void *key);
    // 链表的节点数量
    unsigned long len;
} list;

The structure of linkedlist is shown in Figure 2-5.

5497e5dea7988b8c5bbac0d6e77a5188.png
Figure 2-5

Figure 2-5

The characteristics of Redis' linked list implementation are summarized as follows.

  • Double-ended: Linked list nodes have prev and next pointers, and the complexity of obtaining a node's predecessor node and successor node is O(1).

  • Acyclic: the prev pointer of the head node and the next pointer of the tail node both point to NULL, and the access to the linked list ends with NULL.

  • With head pointer and tail pointer: through the head pointer and tail pointer of the list structure, the complexity of the program to obtain the head node and tail node of the linked list is O(1).

  • Use the len attribute of the list structure to record the number of nodes, and the complexity of obtaining the number of nodes in the linked list is O(1).

MySQL: "It doesn't seem to be a problem, why do you need a ziplist?"

You know, I'm all about speed and memory saving, and there are two reasons that led to the birth of ziplist.

  • Ordinary linkedlist has two pointers, prev and next. When the stored data is small, the space occupied by the pointer will exceed the space occupied by the data . This is outrageous, which is tolerable.

  • linkedlist is a linked list structure, which is not continuous in memory, and the efficiency of traversal is low.

ziplist (compressed list)

In order to solve the above two problems, antirez created the ziplist compressed list, which is a data structure with compact memory, which occupies a continuous memory space and improves memory usage.

When a list has only a small amount of data, and each list item is either a small integer value or a string with a relatively short length, then I will use ziplist as the underlying implementation of List.

A ziplist can contain multiple entry nodes, and each node can store integers or strings , as shown in Figure 2-6.

d789a76b07d0f71e4ada1af80aaf3b1c.png
Figure 2-6

Figure 2-6

  • zlbytes, occupying 4 bytes, records the total number of bytes occupied by the entire ziplist.

  • zltail, occupying 4 bytes, points to the offset of the last entry, which is used to quickly locate the last entry.

  • zllen, occupying 2 bytes, records the total number of entries.

  • entry, list element.

  • zlend, ziplist end flag, occupying 1 byte, the value is equal to 255.

Because the size of the head and tail metadata of the ziplist is fixed, and the position of the last element is recorded in zllen at the head of the ziplist, when looking for the first or last element in the ziplist, it can be done in O(1) time Complexity found.

When looking for intermediate elements, you can only traverse from the head or tail of the list, and the time complexity is O(N).

Next, let's see what the entry structure that actually stores data looks like.

7dd252ef40d3a09631688f28d79d1c34.png
Figure 2-7

Figure 2-7

Normally it consists of three parts <prevlen> <encoding> <entry-data>.

too much

Record the number of bytes occupied by the previous entry. The ability to traverse in reverse order depends on this field to determine how many bytes to move forward to get the first address of the previous entry.

This part will be variable-length encoded according to the length of the previous entry (in order to save memory, I have broken my heart), and the lengthening method is as follows.

  • The byte size of the previous entry is less than 254 (255 is used for zlend), the length of prevlen is 1 byte, and the value is equal to the length of the previous entry.

  • The byte size of the previous entry is greater than or equal to 254, prevlen occupies 5 bytes, the first byte is set to 254 as an identifier, and the following four bytes form a 32-bit int value, which is used to store the bytes of the previous entry length.

encoding

In short, it is used to indicate the type and length of the current entry. The length and value of the current entry are determined based on whether the stored int or string and the length of the data.

The first two digits are used to indicate the type, and the value of the current two digits is "11", which means that the entry stores int type data, and the others indicate that the storage is string.

entry-data

The area where data is actually stored. It should be noted that if the entry is stored in int type, encoding and entry-data will be merged into encoding, and there is no entry-data field.

At this moment the structure becomes <prevlen> <encoding>.

MySQL: "Why do you say that ziplist saves memory?"

  1. Compared with linkedlist, there are less prev and next pointers.

  2. Use the encoding field to refine the storage for different encodings, and allocate as much as possible. When the entry stores an int type, the encoding and entry-data will be merged into the encoding, and the entry-data field will be omitted.

  3. Each entry-data occupies a different memory size. In order to solve the traversal problem, the length of the previous entry in the prevlen record is increased. The time complexity of traversing data is O(1), but it has little impact when the amount of data is small.

MySQL: "Sounds perfect, why do you still have a quicklist"

It is difficult to achieve both the need and the need. Ziplist saves memory, but it also has shortcomings.

  • Too many elements cannot be saved, otherwise the query performance will be greatly reduced, O(N) time complexity.

  • The ziplist storage space is continuous. When inserting a new entry, if the memory space is insufficient, a continuous memory space needs to be reallocated, causing the problem of chain update.

chain update

Each entry uses prevlen to record the length of the previous entry. When a new entry A is inserted in front of the current entry B, the prevlen of B will change, and the size of entry B will also change. The prevlen of entry C after entry B also needs to be changed. By analogy, it may cause chain updates.

7da0a792738c6ab9273a86c8eebc9a7a.png
Figure 2-8

Figure 2-8

Chain update will cause the memory space of ziplist to be reallocated multiple times, which directly affects the query performance of ziplist. So quicklist was introduced in Redis version 3.2.

quicklist

quicklist is a new data structure introduced with comprehensive consideration of time efficiency and space efficiency. Combining the respective advantages of the original linkedlist and ziplist, the essence is still a linked list, but each node of the linked list is a ziplist.

The data structure is defined in quicklist.hthe file , the linked list quicklistis defined by the structure, and each node is defined by the quicklistNodestructure (the source code version is 6.2, and the 7.0 version uses listpack instead of ziplist).

quicklist is a doubly linked list, so each quicklistNode has a preorder pointer ( *prev) and a postorder pointer ( *next). Each node is a ziplist, so there is also a pointer to the ziplist *zl.

typedef struct quicklistNode {
    // 前序节点指针
    struct quicklistNode *prev;
    // 后序节点指针
    struct quicklistNode *next;
    // 指向 ziplist 的指针
    unsigned char *zl;
    // ziplist 字节大小
    unsigned int sz;
    // ziplst 元素个数
    unsigned int count : 16;
    // 编码格式,1 = RAW 代表未压缩原生ziplist,2=LZF 压缩存储
    unsigned int encoding : 2;
    // 节点持有的数据类型,默认值 = 2 表示是 ziplist
    unsigned int container : 2;
    // 节点持有的 ziplist 是否经过解压, 1 表示已经解压过,下一次操作需要重新压缩。
    unsigned int recompress : 1;
    // ziplist 数据是否可压缩,太小数据不需要压缩
    unsigned int attempted_compress : 1;
    // 预留字段
    unsigned int extra : 10;
} quicklistNode;

As a linked list, quicklist defines head and tail pointers, which are used to quickly locate the head of the list and the tail of the linked list.

typedef struct quicklist {
    // 链表头指针
    quicklistNode *head;
    // 链表尾指针
    quicklistNode *tail;
    // 所有 ziplist 的总 entry 个数
    unsigned long count;
    // quicklistNode 个数
    unsigned long len;
    int fill : QL_FILL_BITS;
    unsigned int compress : QL_COMP_BITS;
    unsigned int bookmark_count: QL_BM_BITS;
    // 柔性数组,给节点添加标签,通过名称定位节点,实现随机访问的效果
    quicklistBookmark bookmarks[];
} quicklist;

Combined with quicklist 和 quicklistNodethe definition , the structure of the quicklist linked list is shown in the figure below.

8b57278a78d1eb3c82570b2674871893.png
Figure 2-9

Figure 2-9

From a structural point of view, quicklist is an upgraded version of ziplist. The key point of optimization is to control the size or number of elements of each ziplist.

  • The smaller the ziplist of quicklistNode, it may cause more memory fragmentation. In extreme cases, each ziplist has only one entry and degenerates into a linkedlist.

  • The ziplist of quicklistNode is too large. In extreme cases, a quicklist has only one ziplist, which degenerates into a ziplist. The performance problems of chain updates will be exposed.

Reasonable configuration is very important, Redis provides list-max-ziplist-size -2,

When list-max-ziplist-size it is a negative number, it means to limit the memory size of the ziplist of each quicklistNode . If it exceeds this size, linkedlist will be used to store data. Each value has the following meanings:

  • -5: maximum ziplist size on each quicklist node is 64 kb <--- not recommended for normal environment

  • -4: ziplist size max 32 kb on each quicklist node <--- NOT RECOMMENDED

  • -3: ziplist size max 16 kb on each quicklist node <--- probably not recommended

  • -2: ziplist size max 8 kb on each quicklist node <--- not bad

  • -1: ziplist size max 4kb on each quicklist node <--- not bad

The default value is -2, which is also the most recommended value officially, of course, you can modify it according to your actual situation.

MySQL: "After working for a long time, I still can't solve the problem of chain update"

Don't worry, you have to eat food in one bite, and you have to walk step by step on the road. If you take big steps, it is easy to tear your eggs.

ziplist is a compact data structure that makes efficient use of memory. However, each entry prevlenretains the length of the previous entry, so there may be chain updates affecting efficiency when inserting or updating.

So antirez designed a quicklist composed of "linked list + ziplist" to avoid a single ziplist from being too large and reduce the impact of chain updates.

But after all, ziplist is still used, and the problem of chain update cannot be avoided in essence, so another memory-compact data structure listpack was designed in version 5.0, and ziplist was replaced in version 7.0.

listpack

The reason for listpack is that a user reported a Redis crash problem, but antirez did not find a clear reason for the crash, guessing that it might be caused by chain updates caused by the ziplist structure, so I wanted to design a simple and efficient data structure to Replace the ziplist data structure.

MySQL: "What is listpack?"

listpack is also a compact data structure, which uses a continuous memory space to store data, and uses multiple encoding methods to represent data of different lengths to save memory space.

Source code listpack.hfile Explanation of listpack: A lists of strings serialization format, which means a serialization format of a list of strings, which can serialize and store a list of strings, and can store strings or integer numbers.

First look at the overall structure of listpack.

8b9b75f8b19952da32a8799ccc61182e.png
Figure 2-10

Figure 2-10

A total of four parts, tot-bytes, num-elements, elements, listpack-end-byte.

  • tot-bytes, that is, total bytes, occupies 4 bytes, and records the total number of bytes occupied by listpack.

  • num-elements, occupying 2 bytes, records the number of elements in listpack elements.

  • elements, listpack elements, the part that holds the data.

  • listpack-end-byte, the end flag, occupying 1 byte, the value is fixed at 255.

MySQL: "Hey guy, what's the difference between this and ziplist? Don't think I won't recognize you if you change your name or vest"

Hear me out! Indeed, a listpack is also composed of metadata and the data itself. The biggest difference is the elements part. In order to solve the problem of ziplist chain update, element no longer saves the length of the previous item like ziplist entry .

770c52529a3eff48971b98fecd627523.png
Figure 2-11

Figure 2-11

  • encoding-type, the encoding type of the element, will encode integers and strings of different lengths.

  • element-data, the actual stored data.

  • element-tot-len, the total length of encoding-type + element-data, excluding its own length.

Each element only records its own length, unlike ziplist entry, which records the length of the previous item. When modifying or adding an element, it will not affect the length change of the subsequent element, which solves the problem of chain update.

From linkedlist , ziplist to quicklist composed of "linked list + ziplist" , and then to listpack structure. It can be seen that the original intention of the design is to use memory efficiently while avoiding performance degradation.

Click the card below to follow "IT Learning Town", a hardcore man who only sells dry goods

Guess you like

Origin blog.csdn.net/weixin_44045828/article/details/130051146
Recommended