A dictionary, also known as a symbol table associative array or map, is an abstract data structure that stores key-value pairs.
As a common data structure, a dictionary is built into many programming languages. Since the C language does not have such a data structure, Redis Built its own dictionary implementation.
Dictionaries are widely used in Redis. For example, the Redis database uses dictionaries as the underlying implementation, and the operations of adding, deleting, modifying and querying the database are also built on the operations of the dictionary.
In addition to being used as a database, the dictionary is also one of the bottom layers of the hash key. When a hash key contains many key-value pairs, and the elements in the Ou Zhe key-value pairs are all relatively long strings, Redis will use the dictionary. The underlying implementation as a hash key.
Implementation of the dictionary
hash table
The hash table used by Redis dictionaries is dict.h/dictht
defined by the structure:
typedef struct dictht {
dictEntry **table; //哈希表数组
unsigned long size; //哈希表大小
unsigned long sizemask; //用于计算索引值,
//总是等于 size - 1
unsigned long used; //哈希表已有节点数量
} dictht;
-
table
A property is an array, each element in the array is a pointer to adictEntry
structure, and eachdictEntry
structure holds a key-value pair. -
size
The property records the size of the hash table, which istable
the size of the array -
sizemask
The property and hash together determinetable
at which index of the array a key should be placed
hash table node
Hash table nodes are dictEntry
represented using, each dictEntry
structure holds a key-value pair
typedef struct dictEntry {
void *key; //键
union { //值
void *val;
uint_64 u64;
int64_t s64;
} v;
sturct dictEntry *next; //指向下个哈希表节点,形成链表
} dictEntry;
-
Note that the v attribute here holds the value in the key-value pair, where the key value can be a pointer, a uint_64 integer, or an int64_t integer.
-
next
An attribute is a pointer to another hash table node, which connects multiple key-value pairs with the same hash value to solve the problem of key-value collision.
dictionary
Dictionaries in Redis are represented by dict.h/dict
structures:
typedef struct dict {
dictType *type; //类型特定函数
void *privdata; //私有数据
dictht ht[2]; //哈希表
int rehashdx; //rehash 索引,当 rehash 不在进行时,值为-1
} dict;
-
type
andprivdata
are set for creating polymorphic dictionaries for different types of key-value pairs`type`指向一个 `dictType` 结构的指针, 每个` dictType` 结构保存了一簇用于操作特作特定类型键值对的函数, Redis 会为用途不同的字典设置不同的类型特定函数. 而` pridata` 则保存了需要传给那些特定类型函数看可选参数.
-
ht
Attribute, contains two arrays, each item of the array is adictht
hash table, in general, the dictionary only uses theht[0]
hash table, andht[1]
the hash table is only used when rehash the hash table. -
rehashidex
Records the current progress of rehash, if no rehash is performed, the value is -1.
The following figure shows a dictionary in normal state (without rehash)
hash algorithm
When a new key-value pair is to be added to the dictionary, the program will calculate the hash value and index value according to the key, and then put the hash table node containing the new key-value pair into the hash table according to the index value. On the specified index of the array.
Redis calculates the hash value and index as follows:
hash = dict->type->hashFunction(k);
index = hash & dict->ht[0].sizemask
Suppose, to add the key-value pairs k1 and v1 in the above figure to the dictionary, and use hashFunction
the hash value of k1 to calculate 9, then
index = 9 & 3 = 1;
Redis uses the MurmurHash2 algorithm to compute the hash of a key.
resolve key conflicts
When two or more keys are assigned to the same index in the hash table array, the keys are said to collide ( collision
)
Redis's hash table uses the chain address method to resolve conflicts. Each hash table node has a next
pointer. Multiple hash table nodes can use next
pointers to form a single chain list. Multiple nodes assigned to the same index can Use this pair of singly linked lists to join, which solves the problem of key conflicts.
As shown in the previous dictionary diagram, the index values of the keys k0 and k1 are both 1. Here, we only need to use next
pointers to connect the two nodes.
, Because dictEntry
the linked list composed of nodes does not have a tail pointer, for the sake of speed, the program always puts the The price of the new node is adjusted to the head position of the linked list, and it is placed in front of other existing nodes, so the complexity of insertion is $ O(1) $.
Rehash
As the operation continues, the key-value pairs stored in the hash table will gradually increase or decrease. In order to keep the load factor of the hash table within a reasonable range, when the number of key-value pairs stored in the hash table is too large When there is too much or too little, the program needs to expand or shrink the size of the hash table accordingly. This process is called rehash
.
The steps for Redis to perform rehash on the hash table of the dictionary are as follows:
-
ht[1]
Allocate space for the dictionary's hash table. The size of the space depends on the operation to be performed, andht0]
the number of key-value pairs currently contained (the value of the used attribute):-
If the expansion operation is performed, then
ht[1]
the size is the firstht0].used*2
$2^n$ greater than or equal to. -
If the shrink operation is performed, then
ht[1]
the size is aht[0].used
$2^n$ greater than or equal to.
-
-
Save
ht[0]
all key-value pairsrehash
in theht[1]
above: Anything refers to recalculating the hash value and index value of the key, and then placing the key-value pairht[1]
in the specified position of the hash table. -
When
ht[0]
all the contained key-value pairs have been migrated toht[1]
, releaseht[0]
,ht[1]
set toht[0]
, andht[1]
create an empty hash table afterwards.
For example, suppose the program wants to expand the `ht[0] in the following figure
-
The current value of ht[0].used is 4, and $2^3$ happens to be the first value greater than or equal to 4*2, so the size of the ht[1] hash table is set to 8. The following figure shows ht[1] What the dictionary looks like after allocating space.
-
Rehash the four key-value pairs contained in ht[0] to ht[1], as shown in the following figure:
-
Release ht[0], set ht[1] to ht[0]. Then allocate an empty hash table. The size of the hash table is extended from the original 4 to 8.
progressive rehash
As mentioned in the previous section, expanding or shrinking a hash table needs ht[0]
to rehash all the key-value pairs ht[1]
in it, but this rehash
action is not a one-time, centralized completion, but a multiple, incremental completion.
The reason for this is that when the key-value pairs stored in the hash table are as many as one million or even hundreds of millions, rehash all at once, the huge amount of calculation will have a serious impact on server performance.
Here are the steps for progressive rehash:
-
to
ht[1]
allocate space -
Maintain an index counter variable in the dictionary
rehashidx
and set its value to 0, indicating that the rehash officially starts -
During the rehash process, every time the dictionary is added, deleted, modified and searched
rehashidx
, all the key-value pairs on the index of ht[0] will be rehashed to ht[1], andrehashidx
1 will be added at the same time. -
As the operation continues, finally at a certain point in time,
ht[0]
all key-value pairs are rehashedht[1]
, which means that therehashidx
attribute is set to -1, indicating that the rehash operation is completed.
During the execution of the progressive rehash, the key-value pairs newly added to the dictionary are all stored in ht[1]
it, and will not ht[0]
be added ht[0]
.
Progressive rehash avoids the huge amount of computation and memory operations brought by centralized rehash.