The underlying principle of redis's Sorted Sets

1. What is Sorted Sets

Sorted Sets, similar to Sets, is a collection type, and there will be no duplicate data (member) in the collection . The difference is that the Sorted Sets element consists of two parts, member and score. A member will be associated with a double type score (score), and sorted sets will sort the members from small to large according to this score by default. If the scores associated with the members are the same, they will be sorted according to the dictionary order of the strings.

picture

Common usage scenarios:

  • Leaderboards, such as maintaining an ordered list of the Top 10 ranked by score in a large online game.
  • Rate limiter, build a sliding window rate limiter from a sorted collection.
  • In the delay queue, the score stores the expiration time, sorted from small to large, and the first is the data that expires first.

2. Practice mental method

There are two ways to store data at the bottom of Sorted Sets.

  • It was ziplist before version 7.0, and was replaced by listpack later. The condition for using listpack storage is that the number of collection elements is less than or equal to the  zset-max-listpack-entries configuration value (default 128), and the byte size occupied by members is less than  zset-max-listpack-value the configuration value (default 64). Use listpack storage, member and score are stored compactly as an element of listpack.
  • If the above conditions are not met, use the combination of skiplist + dict (hash table) for storage. Data will be inserted into the skiplist and data will be inserted into the dict (hash table) at the same time. This is an idea of ​​exchanging space for time. The key of the hash table stores the member of the element, and the value stores the score associated with the member.

MySQL: "That is to say, listpack is suitable for scenarios where the number of elements is small and the content of the elements is not large."

Yes, the purpose of using listpack storage is to save memory. Sorted Sets can support efficient range queries precisely because of the skiplist jump table. For example, the  time complexity ZRANGE  of the command is , n is the number of members, and m is the number of returned results. Note that you should avoid commands that return a large number of results.O(log(n)) + m

The reason for using dict is to achieve O(1) time complexity to query a single element. Such as  ZSCORE key member instructions. To sum up, when Sorted Sets is inserted or updated, it will insert or update the corresponding data into the skiplist and the hash table at the same time. Ensure that the data in skiplist and hash table are consistent.

 

MySQL: "This method is very ingenious. The skiplist is used to perform range query or single query based on the score, and the dict hash table is used to realize the O(1) time complexity query corresponding to the score according to the data query, which satisfies efficient range query and single element query. Inquire."

The source code of sorted sets is mainly in the following two files.

  • The structure is defined in  server.h.
  • The function is realized now  t_zset.c.

First look at how the skiplist (skip table) + dict (hash table) data structure stores data.

skiplist + dict

MySQL: "Tell me what is a jump table"

The essence is an ordered linked list that can perform binary search . The jump table adds a multi-level index to the original ordered linked list, and realizes fast search through the index. Not only does it improve search performance, but it also improves the performance of insertion and deletion operations. It is comparable to red-black tree and AVL tree in performance, but the principle and implementation of jump table are simpler than red-black tree. Looking back at the linked list, its pain point is that the query is very slow, and the O(n) time complexity is unbearable for Redis, which is the only fast and unbreakable.

picture

If a "jump" pointer to the next node is added to every two adjacent nodes of the ordered linked list, then the time complexity of the search can be reduced to half of the original, as shown in the figure below.

picture

In this way, level 0 and level 1 respectively form two linked lists, and the number of linked list nodes in level 1 is only 2 (6, 26).

Skip table node lookup

Search data is always compared from the highest level. If the value saved by the node is smaller than the data to be checked, the jump table will continue to access the next node in the layer; if it encounters a node with a value greater than the data to be checked, it will jump to The linked list of the next layer of the current node continues to search. For example, if you want to search for 17 now, the search path should be in the direction indicated by the red in the figure below.

picture

  • Start from level 1, compare 17 with 6, the value is greater than the node, continue to compare with the next node.
  • Compared with 26, 17 < 26, return to the original node, jump to the level 0 linked list of the current node, compare with the next node, and find the target 17.

skiplist was inspired by the idea of ​​this multi-layer linked list. According to the above method of generating linked lists, the number of nodes in each layer of linked list is half that of the lower layer. This search process is similar to a binary search, and the time complexity is O(log n).

However, this method has a big problem when inserting data. Every time a new node is added, it will disrupt the 2:1 relationship between the number of nodes in the adjacent two-layer linked list. If this relationship is to be maintained, it is necessary to For linked list adjustments, the event complexity is O(n). In order to avoid this problem, it does not require a strict proportional relationship between the number of upper and lower linked list nodes, but randomly selects a layer number for each node, so that inserting nodes only needs to modify the front and rear pointers . The figure below is a skiplist with a 4-layer linked list. Suppose we want to search for 26. The figure below shows the path that the search has gone through.

picture

After having an intuitive image of the classic skip list, let’s take a look at the implementation details of skiplist in Redis. The data structure of Sorted Sets is defined as follows.

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

zset There are two variables in the structure, namely the hash table dict and the skip table zskiplist. dict has been mentioned in the previous article, let's focus on it  zskiplist .

typedef struct zskiplist {
    // 头、尾指针便于双向遍历
    struct zskiplistNode *header, *tail;
    // 当前跳表包含元素个数
    unsigned long length;
    // 表内节点的最大层级数
    int level;
} zskiplist;
  • zskiplistNode *header, *tail, two head and tail pointers, used to realize two-way traversal.
  • length, the total number of nodes contained in the linked list. Note that newly created ones  zskiplist generate a null head pointer, which is not included in the length count.
  • level, represents  skiplist the maximum number of layers of all nodes.

Then continue to look at the definition  zskiplistNode structure of each node in the skiplist.

typedef struct zskiplistNode {
    sds ele;
    double score;

    struct zskiplistNode *backward;

    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];

} zskiplistNode;
  • Sorted Set not only saves the elements, but also saves the weight of the elements. Therefore, corresponding to the ele of type sds, the actual content is stored, and the score of double type is used to save the weight.

  • *backward, the back pointer, pointing to the previous node of this node, which is convenient for searching in reverse order from the tail node. Note that each node has only one backward pointer, and only the level 0 linked list is a doubly linked list.

  • level[], which is a  zskiplistLevel flexible array of structure type. The skip list is a multi-layer ordered linked list, and the nodes of each layer are also linked by pointers, so each element in the array represents a layer of the skiplist.

    • *forward, the forward pointer for the layer.

    • span*forward , span, used to record the number of nodes that span the level0 layer between the pointer of this layer  and the next node pointed by the pointer. span is used to calculate the element rank (rank), for example, to find the rank of ele = Xiaocaiji, score = 17, you only need to add the spans of the nodes that the search path passes through, as shown in the figure below, the span accumulation of the red path, ( rank = (2 + 2) - 1 = 3minus 1 because rank starts at 0). If you want to calculate the ranking from large to small, you only need to subtract the accumulated value of span on the search path from the length of skiplist, ie  4 - (2 + 2) = 0.

The diagram below shows a possible structure of a skiplsit in Redis.

picture

listpack

MySQL: "According to  zset  the structure definition, two data structures, dict and zskiplist, are used respectively, and the shadow of listpack is invisible."

This question is a good one. The details of using listpack storage are reflected in the functions in the source code file. Part of the code is as follows, and internally it will judge whether to use listpack for storage t_zset.c . zaddGenericCommand·

void zaddGenericCommand(client *c, int flags) {
    // 省略部分代码

    // key 不存在则创建 sorted set
    zobj = lookupKeyWrite(c->db,key);
    if (checkType(c,zobj,OBJ_ZSET)) goto cleanup;
    if (zobj == NULL) {
        if (xx) goto reply_to_client;
      // 当 zset_max_listpack_entries == 0 或者
        // 元素字节大小大于 zset_max_listpack_value 配置
        // 则使用 skiplist + dict 存储,否则使用 listpack。
        if (server.zset_max_listpack_entries == 0 ||
            server.zset_max_listpack_value < sdslen(c->argv[scoreidx+1]->ptr))
        {
            zobj = createZsetObject();
        } else {
            zobj = createZsetListpackObject();
        }
        dbAdd(c->db,key,zobj);
    }
   // 省略部分代码
}

We know that listpack is a contiguous memory consisting of multiple data items. Each element of the sorted set is composed of two parts: member and score. When using listpack storage to insert a (member, score) data pair, each member/score data pair is stored in a compact arrangement. The biggest advantage of listpack is that it saves memory. When searching for elements, it can only be searched in order, and the time complexity is O(n). That's why, in the case of a small amount of data, it can save memory without affecting performance. Each search step advances two data items, that is, across a member/score data pair.

picture

Guess you like

Origin blog.csdn.net/qq_28165595/article/details/132073488