Followed by a large bin to read source code - Redis 8 - encoded dictionary objects

Dictionary, is an abstract data structure stored for the key-value pairs. Since there is no built-in dictionary that C language data structure, Redis build your own dictionary to achieve.

In the Redis, it is to use the dictionary to achieve the underlying database. CURD operation of the database is built on the operation of the dictionary.

In addition to the database used to represent addition, the underlying dictionary or hash key implementations. When a hash key comprising a key comparison of many, or of the key elements are relatively long character string, the dictionary will adapt the Redis as the underlying hash key implementation.

1 realization dictionary

Redis dictionary using a hash table as the underlying implementation. A hash table hash table which can have multiple nodes, and each node on the hash table holds a dictionary of key value pairs.

1.1 Hash Table

Redis hash table structure used in the dictionary:

typedef struct dictht {
    dictEntry **table;      // 哈希表数组
    unsigned long size;     // 哈希表大小
    unsigned long sizemask; // 哈希表大小掩码,用来计算索引
    unsigned long used;     // 哈希表现有节点的数量
} dictht;
  • attribute table is an array. Each element in the array is a pointer dictEntry structure, each structure dictEntry holds a key-value pair.
  • size attribute records the size of the hash table, that is, the size of the table array.
  • used property records the hash table there are nodes (key-value pairs) number.
  • Sizemask attribute value is equal to the total number of size-1, and the property values ​​determined with a hash key should be placed on a table in which index array.

Figure 1 shows a blank for the size of the hash table 4.

The size of the hash table is empty 4

1.2 hash table node

Hash table node uses dictEntry structure, said each dictEntry structures are preserved in a key-value pair:

typedef struct dictEntry {
    void *key;              // 键
    union {
        void *val;          // 值类型之指针
        uint64_t u64;       // 值类型之无符号整型
        int64_t s64;        // 值类型之有符号整型
        double d;           // 值类型之浮点型
    } v;                    // 值
    struct dictEntry *next; // 指向下个哈希表节点,形成链表
} dictEntry;
  • key attribute holds the key, and v is the attribute value preserved.
  • the next property is a pointer to another node in the hash table. This pointer can be multiple hash values ​​for the same key issues connected together in order to solve the key conflict.

Figure 2 shows a case where by the next pointer, the same two index key k1 and k0 are connected together.

Key k0 and k1 are connected together

Dictionary 1.3

Dictionary structure:

typedef struct dict {
    dictType *type; // 类型特定函数
    void *privdata; // 私有数据
    dictht ht[2];   // 哈希表(两个)
    long rehashidx; // 记录 rehash 进度的标志。值为 -1 表示 rehash 未进行
    int iterators;  // 当前正在迭代的迭代器数
} dict;

dictType following structure:

typedef struct dictType {
    // 计算哈希值的函数
    unsigned int (*hashFunction)(const void *key);
    // 复制键的函数
    void *(*keyDup)(void *privdata, const void *key);
    // 复制值的函数
    void *(*valDup)(void *privdata, const void *obj);
    // 对比键的函数
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    // 销毁键的函数
    void (*keyDestructor)(void *privdata, void *key);
    // 销毁值的函数
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

type properties and privdata properties are for different types of key-value pairs, in order to create a multi-state set of dictionaries. among them:

  • type attribute is a pointer dictType structure, each function cluster dictType structure holds for operation of a particular type of key-value pairs. Redis can set different types for the particular function without the use of a dictionary.
  • privdata property contains optional parameters to be passed to the specific function of those types.

And ht property is an array comprising two hash table. Under normal circumstances, the use of only dictionary ht [0], is used only ht [1] at the time of ht [0] were rehash.

rehashidx property, which records the current progress rehash, rehash if no current, it is -1. As for what a rehash, do not worry, it will be described in detail later.

Figure 3 is no rehash of the dictionary:

No rehash of the dictionary

2 insertion algorithm

When adding a new key-value pairs in the dictionary, the Redis first calculates based on a key of the key index values ​​and the hash value, then the index value, the hash table comprising key-value pairs into the new node hash tables on the specified array index. The algorithm is as follows:

# 使用字典设置的哈希函数,计算 key 的哈希值
hash = dict->type->hashFunction(key);
# 使用哈希表的 sizemask 属性和哈希值,计算出索引值
# 根据不同情况,使用 ht[0] 或 ht[1]
index = hash & dict[x].sizemask;

Figure 4 - empty dictionary
4, if the key-value pairs [k0, v0] is added to the dictionary, is inserted in the following order:

hash = dict-type->hashFunction(k0);
index = hash & dict->ht[0].sizemask; # 8 & 3 = 0

Calculated, [k0, v0] key should be placed on a hash table array index 0 on the position, shown in Figure 5:

Figure 5 - Add the dictionary k0-v0

2.1 key violation

When there are two or more the number of keys are assigned to the same array index hash table above, we believe that these bonds in the construction of the conflict .

Redis hash table using the chain address law to resolve the conflict built. Each node has a hash table next pointer, hash table plurality of nodes may constitute a singly linked list with the next pointer, it is allocated to the plurality of nodes into a single index of a singly linked list with a link to the next pointer.

For chestnut, suppose we want [k2, V2] key-value pair to a hash table shown in FIG. 6, and the calculated index value of 2 k2, and k1 conflict, and therefore, here with a next pointer where k1 and k2 nodes are connected together, as shown.

Figure 6 - a hash table contains two key-value pairs

Figure 7 - Use k1 and k2 list to resolve conflict

And progressive. 3 rehash rehash

随着对字典的操作,哈希表报错的键值对会逐渐增多或者减少,为了让哈希表的负载因子维持在一个合理的范围之内,当哈希表报错的键值对数量太多或者太少时,程序需要对哈希表进行相应的扩容或收缩。这个扩容或收缩的过程,我们称之为 rehash。

对于负载因子,可以通过以下公式计算得出:

# 负载因子 = 哈希表已保存节点数量 / 哈希表大小
load_factor = ht[0].used / ht[0].size;

3.1 哈希表的扩容与收缩

扩容

对于哈希表的扩容,源码如下:

if (d->ht[0].used >= d->ht[0].size &&
    (dict_can_resize ||
     d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
{
    return dictExpand(d, d->ht[0].used*2);
}

当以下条件被满足时,程序会自动开始对哈希表执行扩展操作:

  • 服务器当前没有进行 rehash;
  • 哈希表已保存节点数量大于哈希表大小;
  • dict_can_resize 参数为 1,或者负载因子大于设定的比率(默认为 5);

收缩

哈希表的收缩,源码如下:

int htNeedsResize(dict *dict) {
    long long size, used;
    size = dictSlots(dict); // ht[2] 两个哈希表的大小之和
    used = dictSize(dict);  // ht[2] 两个哈希表已保存节点数量之和
    # DICT_HT_INITIAL_SIZE 默认为 4,HASHTABLE_MIN_FILL 默认为 10。
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}
void tryResizeHashTables(int dbid) {
    if (htNeedsResize(server.db[dbid].dict))
        dictResize(server.db[dbid].dict);
    if (htNeedsResize(server.db[dbid].expires))
        dictResize(server.db[dbid].expires);
}

当 ht[] 哈希表的大小之和大于 DICT_HT_INITIAL_SIZE(默认 4),且已保存节点数量与总大小之比小于 4,HASHTABLE_MIN_FILL(默认 10,也就是 10%),会对哈希表进行收缩操作。

3.2 rehash

扩容和收缩哈希表都是通过执行 rehash 操作来完成,哈希表执行 rehash 的步骤如下:

  1. 为字典的 ht[1] 哈希表分配空间,这个哈希表的空间大小取决于要执行的操作,以及 ht[0] 当前包含的键值对数量。
    1. 如果执行的是扩容操作,那么 ht[1] 的大小为**第一个大于等于 ht[0].usedx2 的 2^n。
    2. 如果执行的是收缩操作,那么 ht[1] 的大小为第一个大于等于 ht[0].used 的 2^n。
  2. 将保存在 ht[0] 中的所有键值对 rehash 到 ht[1] 上面:rehash 指的是重新计算键的哈希值和索引值,然后将键值对都迁移到 ht[1] 哈希表的指定位置上。
  3. 当 ht[0] 包含的所有键值对都迁移到 ht[1] 后,此时 ht[0] 变成空表,释放 ht[0],将 ht[1] 设置为 ht[0],并在 ht[1] 新创建一个空白哈希表,为下一次 rehash 做准备。

示例:

Figure 8 - will be executed dictionary rehash of
假设程序要对图 8 所示字典的 ht[0] 进行扩展操作,那么程序将执行以下步骤:
1)ht[0].used 当前的值为 4,那么 4*2 = 8,而 2^3 恰好是第一个大于等于 8 的,2 的 n 次方。所以程序会将 ht[1] 哈希表的大小设置为 8。图 9 是 ht[1] 在分配空间之后的字典。

ht1 hash table space allocated for the dictionary - 9

2)将 ht[0] 包含的四个键值对都 rehash 到 ht[1],如图 10。

Figure 10 - ht0 all key-value pairs are migrated to ht1

3)释放 ht[0],并将 ht[1] 设置为 ht[0],然后为 ht[1] 分配一个空白哈希表。如图 11:

Figure 11 - field following the completion of rehash

至此,对哈希表的扩容操作执行完毕,程序成功将哈希表的大小从原来的 4 改为了 8。

3.3 渐进式 rehash

对于 Redis 的 rehash 而言,并不是一次性、集中式的完成,而是分多次、渐进式地完成,所以也叫渐进式 rehash

之所以采用渐进式的方式,其实也很好理解。当哈希表里保存了大量的键值对,要一次性的将所有键值对全部 rehash 到 ht[1] 里,很可能会导致服务器在一段时间内只能进行 rehash,不能对外提供服务。

因此,为了避免 rehash 对服务器性能造成影响,Redis 分多次、渐进式的将 ht[0] 里面的键值对 rehash 到 ht[1]。

渐进式 rehash 就用到了索引计数器变量 rehashidx,详细步骤如下:

  1. 为 ht[1] 分配空间,让字典同时持有 ht[0] 和 ht[1] 两个哈希表。
  2. 在字段中维持一个索引计数器变量 rehashidx,并将它的值设置为 0,表示开始 rehash。
  3. 在 rehash 期间,每次对字典执行 CURD 操作时,程序除了执行指定的操作外,还会将 ht[0] 哈希表在 rehashidx 索引上的所有键值对移动到 ht[1],当 rehash 完成后,程序将 rehashidx 的值加一。
  4. With the operation of the dictionary, eventually a point in time, ht [0] to all key-value pairs will be to rehash ht [1], the program time attribute value to rehashidx -1 indicates rehash has been completed.

Progressive rehash only way to divide and rule will rehash key calculation needed for the work to be shared equally CURD operations on a dictionary, thus avoiding the problems caused by centralized rehash.

In addition, the dictionary during rehash, delete, search, update and other operations will be carried out on two hash tables. For example, to find a key in a dictionary Zhang, the program will now ht [0] inside to find, if not found, go to ht [1] on the lookup.

Note that, the new key-value pairs will be used only saved ht [1], the absence of ht [0] add any operations to ensure ht [0] key contained only reduced the number does not increase, with a rehash operations eventually became an empty table.

12 to 17 shows a complete rehash gradual process:

1) were not rehash the dictionary

图 12 - 未进行 rehash 的字典

2) on the key index 0 pairs rehash

图 13 - rehash 索引 0 上的键值对

On the key 3) rehash one pair of index

图 14 - rehash 索引 1 上的键值对

4) rehash index key on the two pairs

图 15 - rehash 索引 2 上的键值对

5) the index keys on three pairs rehash

图 16 - rehash 索引 3 上的键值对

6) rehash finished

图 17 - rehash 执行完毕

to sum up

  1. Field is widely used for various functions implemented Redis, including a database and a hash key.
  2. Redis dictionary in use hash tables as the underlying implementation, each dictionary having two hash tables, a normal use, used only when a rehash.
  3. Hash table method using a chain address to resolve the conflict key, it is assigned to a plurality of keys on the same index will be connected into a singly linked list.
  4. When hash table for expansion or contraction operation, progressive complete rehash.

Guess you like

Origin www.cnblogs.com/BeiGuo-FengGuang/p/11301070.html