Redis underlying data structure (dictionary)

Dictionary relative to the arrays, linked lists, it is a higher-level data structures, like our Chinese dictionary, like, you can uniquely identify a Chinese character by pinyin or radical, in a program where we each tube called a mapping between a key-value pair a lot of key-value pairs together constitute our dictionary structure.

There are many advanced dictionary structure to achieve, for example, our underlying HashMap in Java implementation, according to Hash values of key value pairs of uniform will disperse to the array, and in the face of hash collision, key violation of the one-way chain series, and more than eight nodes in the linked list structure fission into red-black tree.

So redis is how to achieve it? We take a look.

First, the dictionary definition of the structure

Redis in dictionary related structures are defined in dict.h file, dict represents a dictionary structure:

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; 
    unsigned long iterators;
} dict;

Here, type field points dictType structure, several polymorphic method of this structure is defined as follows:

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

hashFunction hash function pointer, when we stored the data into a dictionary with the set command will first use the key value pairs of the hash function as a parameter, to obtain a more uniform hash value, and then before the actual data of storage. Here will be used as a hash function, if you need to provide a different way hash for your dictionary structure, be enough to achieve a dictType hash function to initialize the dictionary of the time.

keyDup a copy function key, valDup key-value pair is a copy function values, keyCompare is a comparison of the function keys, keyDestructor destroy a key, valDestructor destroy a key-value pairs. Is a polymorphic presentation, the specific implementation requires users to provide their own.

Then look dict structure, privdata pointer storing some additional information Affiliated dictionary structure, ht is an array of dictht structure, dictht is a hash table structure, we wait for the next look at this structure. rehashidx rehash field is used during the record key being transferred. iterators field records the current iterator dictionary ongoing, specific look.

dictht is our hash table structure,

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

table is a pointer to a two-dimensional array dictEntry, each dictEntry in fact express a key-value pair, why is a two-dimensional structure?

In fact, under normal circumstances, our dictionary is to store data:

Each internal dictEntry will save a key / value key-value pairs, then we can iterate through all the key-value pair table pointer, but after a key if the key to the hash calculated and should not be stored position the early bird catches the node, that is, we often say that the hash collision, how do?

redis in practice, and even most of the dictionary structure are chosen to achieve a series of conflicts node into a linked list, so the dictionary structure becomes the case.

The same key hash value of the node on the list must be identical, it is precisely because the same will be strung together, logically, dictionary structure as shown in FIG above, but our code to the abstract layer, the structure is a two-dimensional array, the first dimension is the node pointer put pointer points to the second dimension is the key pointer to point to our structure, each dictEntry structure, there will be a next pointer, in the face of hash collision when all conflicts nodes can be connected in series.

In addition, dictht attributes used to describe the size of the entire hash table dictionary addressable maximum size, i.e. the maximum length of the two-dimensional array in the first dimension, sizemask property is always equal to the size-1 is expressed in one size mask concept for determining the initial position of the node in the array, used a key record entire hash table already stored node number.

Wherein, in ht dict dictionary structure is one of only two array elements, we normally use ht [0] dictionary table, ht [. 1] used in our process rehash gradual transition ht [0] to all nodes.

Finally, we will look at this dictEntry key structure:

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

key is a pointer to any structure, representing we can use our redis key to any type of object, v is a union type, it may be a pointer, or may be uint64_t int64_t type, or may be a double. The actual use, value of different values, using different field attributes.

next 指针指向另一个 dictEntry 结构，用于发生哈希冲突时，链接下一个键值对节点。

以上就是 redis 中字典结构主要结构类型，从里至外封装了三层，dict 描述一个字典，其中的 dictht 描述哈希表，其中的 dictEntry 描述键值对结构。迭代器回头我们单独说说。

二、渐进式 rehash 迁移数据

redis 的 rehash 和 Java 以及其他哈希的实现稍微可能有点不同，由于 redis 是单线程的，不需要写大量的并发语句来保证数据一致性，但是单线程处理也会导致一次 rehash 过程会非常缓慢，客户端阻塞太久。那么 redis 具体是怎么做的呢？

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}

rehashidx 的值默认为 -1，表示当前字典未处于 rehash 阶段，其他场合该字段的值等于当前正在转移桶的索引。

新版本的 dictRehash 需要多传一个参数 n，这个参数用于控制单次最多转移空桶数量。什么意思呢，具体我们看一张图：

有这么一个字典结构，其中索引值为 2 和 3 的两个桶是空的，也即里面没有放我们的键值对节点。正常情况下，一次 rehash 只会转移一个桶，但如果上一次转移了索引为 1 的那个桶，下一次来会遍历后面一个桶，如果继续为空就继续向后遍历，直到找到一个存储了我们节点的非空桶，极端情况下，如果字典表中只有最后一个桶有节点，那么一次的 rehash 就要遍历所有的桶，时间复杂度 O(n)，这会导致客户端等待过长时间，所以新版本中额外传一个参数 n 用于控制最多遍历的空桶数。

三、字典迭代器

迭代器是用于迭代遍历字典中所有的节点的一个工具，有两种，一种是安全迭代器，一种是不安全迭代器。安全迭代器就是指，你在迭代的过程中，允许你对字典结构进行修改，也即允许你添加、删除、修改字典中的键值对节点。不安全迭代器即不允许对字典中任何节点进行修改。

dictIterator 结构的定义如下：

typedef struct dictIterator {
    dict *d;
    long index;
    int table, safe;
    dictEntry *entry, *nextEntry;
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;
} dictIterator;

字段 d 指向一个即将被迭代的字典结构，index 记录了当前迭代到字典中的桶索引，table 取值为 0 或 1，表示当前迭代的是字典中哪个哈希表，safe 标记当前迭代器是安全的或是不安全的。 entry 记录的是当前迭代的节点，nextEntry 的值等于 entry 的 next 指针，用于防止当前节点接受删除操作后续节点丢失情况。fingerprint 保存了 dictFingerprint 函数根据当前字典的基本信息计算的一个指纹信息，稍有一丁点变动，指纹信息就会发生变化，用于不安全迭代器检验。

安全迭代器获取方式：

dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;
    return iter;
}

不安全迭代器获取方式：

dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);

    i->safe = 1;
    return i;
}

下面我们看看迭代器的核心方法，dictNext 用于获取字典中下一个节点。

dictEntry *dictNext(dictIterator *iter)
{
    while (1) {
        //如果迭代器初次工作，entry 必定为 null
        if (iter->entry == NULL) {
            //拿到迭代器 d 字段保存的字典
            dictht *ht = &iter->d->ht[iter->table];
            if (iter->index == -1 && iter->table == 0) {
                if (iter->safe)
                    //给字典的 iterators 字段自增，禁止 rehash操作
                    iter->d->iterators++;
                else
                    //计算并保存指纹信息
                    iter->fingerprint = dictFingerprint(iter->d);
            }
            //迭代器开始工作，指向 0 号桶
            iter->index++;
            //如果 index 大于等于 size，即最后一个桶迭代结束
            if (iter->index >= (long) ht->size) {
                if (dictIsRehashing(iter->d) && iter->table == 0) {
                    //当前字典结构正在 rehash 且 ht[0] 已经遍历结束
                    //继续遍历 ht[1]
                    iter->table++;
                    iter->index = 0;
                    ht = &iter->d->ht[1];
                } else {
                    //否则表示迭代工作确实全部结束
                    break;
                }
            }
            //根据 index 取出节点
            iter->entry = ht->table[iter->index];
        } else {
            //如果 entry 不等于 null，尝试遍历它的后续节点
            iter->entry = iter->nextEntry;
        }
        //到这里，迭代器已经拿到下一个节点了
        if (iter->entry) {
            //记录 nextEntry 节点的值
            iter->nextEntry = iter->entry->next;
            return iter->entry;
        }
    }
    return NULL;
}

大部分逻辑都已经注释上了，整个方法是一个死循环，如果 entry 等于 null，要么是迭代器初次工作，要么是迭代到一个桶的最后节点处了。如果是后者，会进入 if 逻辑中，判断是否整个字典全部迭代结束，如果不是取下一个桶。

如果字典未处于 rehash 状态，自增 iterators 属性的操作会禁止后续节点操作触发 rehash，如果已经处于 rehash 过程了，也不慌，当前 ht[0] 迭代结束后，再去迭代早于迭代器工作前已经被转移到 ht[1] 的那些节点。因为如果你是安全迭代器的话，iterators 一自增之后，后续节点就不会触发 rehash 迁移节点，所以不会重复迭代数据。

迭代器迭代结束之后需要释放关闭释放迭代器，redis 中对应方法：

void dictReleaseIterator(dictIterator *iter)
{
    if (!(iter->index == -1 && iter->table == 0)) {
        if (iter->safe)
            iter->d->iterators--;
        else
            assert(iter->fingerprint == dictFingerprint(iter->d));
    }
    zfree(iter);
}

如果是安全的迭代器，自减 iterators，不安全迭代器会重新计算指纹并与迭代器最开始工作时计算的指纹比较，并通过 assert 断言判断指纹是否一致，如果不一致则说明你在不安全的迭代器中执行了修改字典结构的方法，程序报错并退出。

These are the two safety and non-safety principle redis iterator usage and dictionary-based, after all, is not allowed to rehash side edge traversal, redis in fact there is a high level traversal, we call it scan traversal, which allows the side rehash side iterations, more advanced, we will follow-up analysis of its source code, so stay tuned!

Public concern is not lost, love to share a programmer.
No reply to the public "1024" author plus micro-channel to explore learning!
Each article codes used in all cases the material will be uploaded my personal github
https://github.com/SingleYam/overview_java
Welcome to tread!

No public YangAM