[Redis Series 4] Analysis of the realization principle of hashtable (hash table) and ziplist (compressed list) of Redis hash objects

Analysis of the implementation principle of hashtable and ziplist of Redis hash objects

Preface

Previous We analyzed the underlying storage structure list of objects linkedlist, ziplistand quicklistthe realization of the principle, and will be analyzed and compared the three data structures. In this article, we continue to analyze 哈希对象the underlying storage structure of the third basic data type of the five commonly used data types in Redis .

Hash object

The hash object itself is also a key-value storage structure, and the underlying storage structure can also be divided into two types: ziplistand hashtable. And we know that Redis itself is also a key-value database, but the key-value of Redis itself can only take the hashtableform (also called external hash), and the two storage structures of internal hash are the same as other data types , Is also distinguished by coding:

Encoding attributes description Object encoding command return value
OBJ_ENCODING_ZIPLIST Use compressed lists to implement hash objects ziplist
OBJ_ENCODING_HT Use a dictionary to implement hash objects hashtable

hashtable

The key-value in Redis is realized by dictEntryobjects, and the hash table is dictEntryobtained by packaging the object again. This is the hash table object dictht:

typedef struct dictht {
    
    
    dictEntry **table;//哈希表数组
    unsigned long size;//哈希表大小
    unsigned long sizemask;//掩码大小,用于计算索引值,总是等于size-1
    unsigned long used;//哈希表中的已有节点数
} dictht;

PS: table is an array, each element of which is a dictEntry object .

dictionary

A dictionary is also called a symbol table, an associative array or a map. The hash table dicthtobject is nested inside the dictionary. The following is htthe definition of a dictionary :

typedef struct dict {
    
    
    dictType *type;//字典类型的一些特定函数
    void *privdata;//私有数据,type中的特定函数可能需要用到
    dictht ht[2];//哈希表(注意这里有2个哈希表)
    long rehashidx; //rehash索引,不在rehash时,值为-1
    unsigned long iterators; //正在使用的迭代器数量
} dict;

Wherein dictTypethe interior defines some common functions, its data structure is defined as follows:

typedef struct dictType {
    
    
    uint64_t (*hashFunction)(const void *key);//计算哈希值函数
    void *(*keyDup)(void *privdata, const void *key);//复制键函数
    void *(*valDup)(void *privdata, const void *obj);//复制值函数
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);//对比键函数
    void (*keyDestructor)(void *privdata, void *key);//销毁键函数
    void (*valDestructor)(void *privdata, void *obj);//销毁值函数
} dictType;

So when you create one 哈希对象, you can get the following diagram (some attributes are omitted):
Insert picture description here
PS: The k and v in the hash table at the end store a string object.

rehash operation

ht[2] defines two hash tables, ht[0] and ht[1]. Redis uses ht[0] by default, and does not allocate space for ht[1] initialization.

When setting a hash object, which subscript in the hash array (dictEntry*[3] in the figure above) will be determined by calculating the hash value, if a hash collision occurs, then the same There will be more than one subscript dictEntryto form a linked list (the last inserted is always at the top of the linked list). The longer the linked list, the worse the performance. Therefore, in order to ensure the performance of the hash table, it is necessary to perform a rehash operation on the hash table when one of the following two conditions is met:

  • 1. When the load factor is greater than or equal to 1 and dict_can_resize is set to 1
  • 2. When the load factor is greater than or equal to the safety threshold (dict_force_resize_ratio=5)

PS: Load factor = the number of used nodes of the hash table/the size of the hash table (ie: h[0].used/h[0].size).

rehash step

Expansion hash and shrink hash are both completed by executing rehash, mainly through the following five steps:

  • 1. dictAllocate space for the ht[1] hash table of the dictionary , whose size depends on the number of nodes saved in the current hash table (ie: ht[0].used).
    (a) For the extended operation, the size of ht[1] is the value of the first attribute of 2 n greater than or equal to ht[0].used * 2 (for example, used=3, at this time 2 3 is the first one greater than used * The value of 2 (2 2 <6 and 2 3 >6)).
    (b) For the shrink operation, the size of ht[1] is the first value in 2 n that is greater than or equal to ht[0].used.
  • 2. Set the value of the attribute rehashix in the dictionary to 0, indicating that the rehash operation is being performed.
  • 3. Recalculate the hash value of all key-value pairs in ht[0] in turn, and put them in the corresponding position of the ht[1] array. After completing the rehash of a key-value pair, the value of rehashix needs to be increased by 1.
  • 4. After all the key-value pairs in ht[0] are migrated to ht[1], release ht[0], change ht[1] to ht[0], and then create a new ht[1] ] Array to prepare for the next rehash.
  • 5. Set the attribute rehashix in the dictionary to -1, indicating that rehash has ended

Progressive rehash

The method described above is not rehash all at once, but slowly rehash the key-value pairs in ht[0] to ht[1] in multiple times, which is called progressive rehash . Progressive rehash can avoid the huge amount of calculation caused by centralized rehash, and adopts the idea of ​​divide and conquer.

In the progressive rehash process, because there may be new key-value pairs stored, Redis's approach is to put the newly added key-value pairs into ht[1] uniformly, so as to ensure the ht[0] key The number of value pairs will only decrease .

When performing rehash need to do a query operation, this time will first query ht [0], then the results look less than ht [1] query .

ziplist

About ziplistsome of the features on a list about when the underlying data structure has been analyzed in detail (you want to learn more about, you can click here ). However, the hash object ziplistand object list ziplistis different from the hash of the object that is in the form of a key-value, so that ziplistalso the performance of key-value, and a value close together Key:
Insert picture description here

Encoding conversion of ziplist and hashtable

When a hash object can meet any of the following two conditions, the hash object will choose to use ziplistencoding for storage:

  • 1. The total length of all key-value pairs in the hash object (including keys and values) is less than 64 bytes (this threshold can hash-max-ziplist-valuebe controlled by parameters ).
  • 2. The number of key-value pairs in the hash object is less than 512 (this threshold can hash-max-ziplist-entriesbe controlled by parameters ).

Once any one of these two conditions is not met, the hash object will choose to use it hashtablefor storage.

to sum up

This article mainly introduces the 哈希类型underlying storage structure of the five commonly used data types in Redis, and mainly introduces the use of its underlying data structure hashtable. The ziplistcoding features ziplistare the same as those in the list object except that the key and value are put together . Finally, the conversion conditions of the two codes in the hash object are described.

In the next article, we will analyze 集合对象the underlying storage structure of the fourth of the five commonly used data types in Redis .
Please pay attention to me and learn and progress with the lone wolf .

Guess you like

Origin blog.csdn.net/zwx900102/article/details/109707329