What to do if the hash distribution in Redis is uneven

Preface

Redis is a key-value pair database, whose keys are stored through hashes. The entire Redis can be considered as an outer hash. The reason why it is called an outer hash is because Redis also provides a type of hash internally, which can be called an internal hash. When we use hash objects for data storage, for the entire Redis, it goes through two layers of hash storage.

 

Hash object

The hash object itself is also a key-value storage structure, and the underlying storage structure can also be divided into two types: ziplist (compressed list) and hashtable (hash table). These two storage structures are also distinguished by coding:

Encoding attribute description object encoding command return value OBJ_ENCODING_ZIPLIST Use compressed list to implement hash object ziplist OBJ_ENCODING_HT Use dictionary to implement hash object hashtable

 

hashtable

The key-value in Redis is wrapped by the dictEntry object, and the hash table is obtained by packaging the dictEntry object again. This is the hash table object dictht:

typedef struct dictht {    dictEntry **table;//哈希表数组    unsigned long size;//哈希表大小    unsigned long sizemask;//掩码大小,用于计算索引值,总是等于size-1    unsigned long used;//哈希表中的已有节点数} dictht;复制代码

Note: The table in the above structure definition is an array, each element of which is a dictEntry object.

 

dictionary

A dictionary, also known as a symbol table, an associative array or a map, has a hash table dictht object nested inside the dictionary. The following is the definition of a dictionary ht:

typedef struct dictType {    uint64_t (*hashFunction)(const void *key);//计算哈希值函数    void *(*keyDup)(void *privdata, const void *key);//复制键函数    void *(*valDup)(void *privdata, const void *obj);//复制值函数    int (*keyCompare)(void *privdata, const void *key1, const void *key2);//对比键函数    void (*keyDestructor)(void *privdata, void *key);//销毁键函数    void (*valDestructor)(void *privdata, void *obj);//销毁值函数} dictType;复制代码

Among them, dictType defines some commonly used functions, and its data structure is defined as follows:

typedef struct dictType {    uint64_t (*hashFunction)(const void *key);//计算哈希值函数    void *(*keyDup)(void *privdata, const void *key);//复制键函数    void *(*valDup)(void *privdata, const void *obj);//复制值函数    int (*keyCompare)(void *privdata, const void *key1, const void *key2);//对比键函数    void (*keyDestructor)(void *privdata, void *key);//销毁键函数    void (*valDestructor)(void *privdata, void *obj);//销毁值函数} dictType;复制代码

When we create a hash object, we can get the following diagram (some attributes are omitted):

What to do if Redis hash distribution is uneven

 

rehash operation

An array ht[2] is defined in dict, and two hash tables are defined in ht[2]: ht[0] and ht[1]. Redis will only use ht[0] by default, and will not use ht[1], nor will it allocate space for ht[1] initialization.

When setting a hash object, which subscript in the hash array (dictEntry[3] in the figure above) will fall into is determined by calculating the hash value. If a hash collision occurs (the calculated hash value is the same), then the same subscript will have multiple dictEntry, thus forming a linked list (the rightmost point in the figure above points to the NULL position), but it should be noted that the last insertion The element is always at the top of the linked list (that is, when a hash conflict occurs, the node is always placed at the head of the linked list).

When reading data, when you encounter a node with multiple elements, you need to traverse the linked list, so the longer the linked list, the worse the performance. In order to ensure the performance of the hash table, the hash table needs to be rehashed when one of the following two conditions is met:

  • When the load factor is greater than or equal to 1 and dict_can_resize is 1.
  • When the load factor is greater than or equal to the safety threshold (dict_force_resize_ratio=5).

PS: Load factor = number of used nodes in the hash table / size of the hash table (ie: h[0].used/h[0].size).

 

rehash step

Expansion hash and shrink hash are both completed by executing rehash, which involves space allocation and release, mainly through the following five steps:

  1. Allocate space for the ht[1] hash table of the dictionary dict, and its size depends on the number of nodes saved in the current hash table (ie: ht[0].used):
  2. Set the value of the attribute rehashix in the dictionary to 0, indicating that the rehash operation is being performed.
  3. Recalculate the hash values ​​of all key-value pairs in ht[0] in turn, and place them in the corresponding position of the ht[1] array. The value of rehashix needs to be incremented by 1 after completing the rehash of a key-value pair.
  4. When all the key-value pairs in ht[0] are migrated to ht[1], release ht[0], change ht[1] to ht[0], and then create a new ht[1] array , To prepare for the next rehash.
  5. Set the attribute rehashix in the dictionary to -1, which means that this rehash operation is over and wait for the next rehash.

 

Progressive rehash

This kind of rehashing operation in Redis is not rehash all  at once, but slowly rehash  the key-value pairs in  ht[0]  to  ht[1] in multiple times  , so this operation is also called It is progressive  rehash. Progressive rehash can avoid the huge amount of calculation brought by centralized rehash, which is a divide and conquer idea.

In the progressive rehash process, because there may be new key-value pairs stored in, at this time ** Redis's approach is to put the newly added key-value pairs into ht[1] uniformly, so as to ensure ht[0 ] The number of key-value pairs will only decrease**.

When the operation is being performed rehash, if the server receives a command from the client's request operation, will first query  ht [0] , then the results look less than ht [1]  query .

 

ziplist

Some features of ziplist have been analyzed separately in the previous article. If you want to learn more, you can click here. But it should be noted that the difference between the ziplist in the hash object and the ziplist in the list object is that the hash object is a key-value form, so the ziplist also appears as a key-value, and the key and value are close together:

What to do if Redis hash distribution is uneven

 

ziplist and hashtable encoding conversion

When a hash object can meet any of the following two conditions, the hash object will choose to use ziplist encoding for storage:

  • The total length of all key-value pairs in the hash object (including keys and values) is less than or equal to 64 bytes (this threshold can be controlled by the parameter hash-max-ziplist-value).
  • The number of key-value pairs in the hash object is less than or equal to 512 (this threshold can be controlled by the parameter hash-max-ziplist-entries).

Once any one of these two conditions is not met, the hash object will choose to use hashtable encoding for storage.

 

Common commands for hash objects

  • hset key field value: Set a single field (key value of the hash object).
  • hmset key field1 value1 field2 value2: Set multiple fields (key values ​​of hash objects).
  • hsetnx key field value: Set the value of the field field in the hash table key to value. If the field already exists, no operation is performed.
  • hget key field: Get the value corresponding to the field field in the hash table key.
  • hmget key field1 field2: Get the value corresponding to multiple fields in the hash table key.
  • hdel key field1 field2: Delete one or more fields in the hash table key.
  • hlen key: Returns the number of fields in the hash table key.
  • hincrby key field increment: Add increment to the value of the field field in the hash table key. The increment can be a negative number. If the field is not a number, an error will be reported.
  • hincrbyfloat key field increment: Add increment to the value of the field field in the hash table key. The increment can be a negative number. If the field is not a float type, an error will be reported.
  • hkeys key: Get all fields in the hash table key.
  • hvals key: Get the values ​​of all fields in the hash table.

Knowing the common commands for operating hash objects, we can verify the type and encoding of the hash objects mentioned earlier. In order to prevent the interference of other key values ​​before testing, we first execute the flushall command to clear the Redis database.

Then execute the following commands in sequence:

hset address country chinatype addressobject encoding address复制代码

Get the following effects:

What to do if Redis hash distribution is uneven

 

You can see that when there is only one key-value pair in our hash object, the underlying encoding is ziplist.

Now we change the hash-max-ziplist-entries parameter to 2, then restart Redis, and finally enter the following command to test:

hmset key field1 value1 field2 value2 field3 value3object encoding key复制代码

After the output, the following results are obtained:

What to do if Redis hash distribution is uneven

 

As you can see, the encoding has become a hashtable.

 

to sum up

This article mainly introduces the use of hashtable, the underlying storage structure of the hash type among the five commonly used data types in Redis, and how Redis performs re-hashing when the hash distribution is uneven. Finally, I learned about some commonly used hash objects Order and verify the conclusion of this article through some examples.

Guess you like

Origin blog.csdn.net/Java0258/article/details/112990267