[Redis]-[Underlying data structure]-Dictionary

Preface

A dictionary, also known as a symbol table, associative array, or map, is a data structure used to hold key-value pairs

Dictionary implementation

Redis's dictionary uses hash tables as the underlying implementation

Hash table

typedef struct dictht{
    
    
	dictEntry** table;       //哈希表数组
	unsigned long size;      //哈希表大小
	unsigned long sizemask;  //哈希表大小掩码,总是等于 size - 1
	unsigned long used;      //该哈希表已有节点的数量,即所保存键值对的数目
}dictht;

The table attribute is an array, where each element is a pointer to a dictEntry structure. Each dictEntry structure stores a key-value pair.

Hash table node

typedef struct dictEntry{
    
    
	void *key;               //键
	union{
    
    
		void *val;
		uint64_tu64;
		int64_ts64;
	}v;                      //值
	struct dictEntry *next;  //指向下一个哈希表节点,形成链表
}dictEntry;

It can be seen that the hash table in Redis uses the chain address method to solve the problem of hash conflicts. And since the pointer to the end of the linked list is not saved, for the sake of speed, the program uses the head insertion method , and the insertion complexity is O(1)

dictionary

typedef struct dict{
    
    
	dictType *type;  //类型特定函数
	void *privdata;  //私有数据
	dictht ht[2];    //哈希表
	int rehashidx;   //rehash 索引,当rehash过程不在进行时,值为-1
}dict;

The type attribute and privdata attribute are set for different types of key-value pairs to create polymorphic dictionaries.
Among them, the type attribute is a pointer to a dictType structure, and each dictType structure stores a cluster of functions for operating specific types of key-value pairs. Redis will set different functions for dictionaries with different uses. The ht attribute is a set containing
two An array of elements. Each item is a hash table. Generally, the ht[0] hash table is used. ht[1] is only used during rehash.

The structure of the entire dictionary is as follows:

Insert image description here

Hash algorithm

When adding a new key-value pair to the dictionary:

  1. Calculate the hash value based on the key of the key-value pair, using the hash function in the dictionary to calculate
    hash = dict->type->hashFunction(key)
  2. According to the hash value and the size mask of the hash table, an AND operation is performed to calculate the index value.
    index = hash & dict->ht[x].sizemask
  3. Then place the key-value pair at the specified index of the hash table array

When a dictionary is used as the underlying implementation of a database, or the underlying implementation of a hash key, Redis uses the MurmurHash2 algorithm to calculate the hash value of the key.

rehash

In order to keep the load factor of the hash table within a reasonable range, when the key-value pairs stored in the hash table are too many or too few, the program needs to expand or shrink the size of the hash table accordingly, by executing rehash To complete the operation:

  1. Allocate space for the ht[1] hash table of the dictionary. The size of the hash table depends on the operation to be performed and the number of key-value pairs contained in the current ht[0] >>>
    If the expansion operation is performed , then ht[ 1] is the size of the first power of n greater than or equal to ht[0].used * 2
    >>> If the contraction operation is performed , then the size of ht[1] is the first value greater than or equal to ht[ 0].used 2 raised to the nth power
  2. Rehash all key-value pairs stored in ht[0] to ht[1]. rehash means recalculating the hash value and index value of the key, and then placing it in the corresponding position of the ht[1] hash table based on the new index value.
  3. When all key-value pairs contained in ht[0] are migrated to ht[1], release ht[0], set ht[1] to ht[0], and create a new blank hash in ht[1] table to prepare for the next rehash

Timing of expansion or contraction

When the server is not currently executing the BGSAVE command or BGREWRITEAOF command , and the load factor of the hash table is greater than or equal to 1 , or the server is executing the BGSAVE command or BGREWRITEAOF command , and the load factor of the hash table is greater than or equal to 5 , the program will automatically update the hash table. The table is expected to perform an expansion operation. where the load factor isload_factor = ht[0].used / ht[0].size

Why does the required load factor for the selected extension differ depending on whether the BGSAVE command or the BGREWRITEAOF command is being executed?

  • This is because Redis needs to create a child process of the current server process during the execution of the BGSAVE command or BGREWRITEAOF command , and most operating systems use the copy-on-write technology when creating and using child processes. If the hash table is expanded, the memory copy operation of the child process to the parent process will be triggered. Therefore, when the child process exists, the server will increase the load factor required to perform the expansion operation, so as to avoid the expansion operation of the parent process during the existence of the child process as much as possible, avoid unnecessary memory writing actions, and save memory to the greatest extent.

When the load factor of the hash table is less than 0.1 , the program automatically starts to shrink the hash table.

Progressive rehash

The process of rehash is not completed in a one-time and centralized manner, but is completed in multiple stages and gradually.

The reason is that if it is completed at one time, it is acceptable for the case where there are few key-value pairs saved in ht[0], because one-time rehash will not take too much time; and if many key-value pairs are saved in ht[0] Yes, tens of millions, then if you want to rehash all these key-value pairs into ht[1] at one time, the huge amount of calculation and workload may cause the server to stop serving for a period of time

The steps of progressive rehash are:

  1. Allocate space for ht[1] so that the dictionary holds both ht[0] and ht[1]

  2. Set the index counter variable rehahidx in the dictionary to 0, indicating that the rehash work has officially begun

  3. During the rehash process, every time the dictionary is added, deleted, searched or updated, the program will not only perform the specified operation, but also add all the key-value pairs of the ht[0] hash table on the rehashidx index rehash to ht[1]. When the rehash is completed, the program increases the value of the rehasidx attribute by one and prepares to rehash the key-value pair at the next index position.

    The lookup (delete, update also involves lookup) operation performed during this period will be performed on the two hash tables, first lookup in ht[0], if not found, continue to ht[1] Search;
    the addition operation is always performed on ht[1], ensuring that the number of key-value pairs contained in ht[0] will only decrease but not increase, and it will eventually become an empty table.

  4. With the continuous execution of dictionary operations, eventually at a certain point in time, all key-value pairs in ht[0] will be rehashed into ht[1], at this time the program will set the value of rehashidx to -1, indicating The entire rehash operation has been completed

Guess you like

Origin blog.csdn.net/Pacifica_/article/details/125347305