redis源码阅读(4)-timeEvnet之rehash重哈希过程

当redis数据库中HashTable数据达到触发条件时，会触发哈希表的重构的操作。

触发操作同时需要检测server是否有持久化操作，即检测持久化进程是否存在，如果存在那么rehash过程不会操作。因为当有单独进程在进行持久化操作时，会引起数据差异化，即持久化进程所持有的的hash表数据，和主进程所持有的hash表数据会不同。只有在进程创建的那一刻两者的数据时一致的，这是在创建进程时的copy-on-write 引起的。

redis为了兼顾性能的考虑，分为lazy和active的两种rehash操作，同时进行，直到rehash完成。下面会看一下这两种rehash操作：

1、Resize缩小哈希表触发条件是：

元素数量/槽的数量小于REDIS_HT_MINFILL时，触发dictResize操作

int htNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    return (size && used && size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < REDIS_HT_MINFILL));
}

/* If the percentage of used slots in the HT reaches REDIS_HT_MINFILL
 * we resize the hash table to save memory */
void tryResizeHashTables(int dbid) {
    if (htNeedsResize(server.db[dbid].dict))
        dictResize(server.db[dbid].dict);
    if (htNeedsResize(server.db[dbid].expires))
        dictResize(server.db[dbid].expires);
}

2、触发哈希表扩大的条件：

哈希表中元素的数量大于槽的数量或者元素的数量/槽的数量大于dict_force_resize_ratio时触发扩大操作。
成倍扩大哈希表

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
     //哈希表中元素的数量大于槽的数量或者元素的数量/槽的数量大于dict_force_resize_ratio时触发 扩大操作
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, ((d->ht[0].size > d->ht[0].used) ?
                                    d->ht[0].size : d->ht[0].used)*2);
    }
    return DICT_OK;
}

从上面的代码看出来，无论缩小还是扩大，都调用了int dictExpand(dict *d, size_t size)函数

int dictExpand(dict *d, size_t size)
{
    dictht n; /* the new hash table */
    size_t realsize = _dictNextPower(size);

    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = (size_t) 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
     //如果程序刚启动，也就是hash表为空时，直接创建hash表
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second hash table for incremental rehashing */
   //创建1号哈希表
    d->ht[1] = n;
    d->rehashidx = 0;

/* Expand or create the hash table */
    return DICT_OK;
}

每个数据路结构都有两个哈希表。当没达到触发条件时，使用0号哈希表，接下来的set的数据都保存在0号哈希表中，当达到触发条件后，根据新的size创建1号哈希表，并设置d->rehashidx为非-1，意味着开始转移数据，此时新添加的数据都会放到1号哈希表中，旧数据会分为lazy rehash 和active rehashing 过程。

3、lazy rehash（每当客户端请求时，rehash一个槽）

这是redis的有关性能的考虑，考虑到数据量很大时，一次就所有的旧数据转移，此时转移的过程中，新的客户端请求都会阻塞，会带来的较大的延时。lazy rehash就是每当有客户端请求时，检查d->rehashidx是否正在rehash,如果正在经历rehash过程，那么直rehash一个哈希表的槽。具体的执行函数是_dictRehashStep(d);

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

看到dictRehash的第二个参数是1，即只rehash一个哈希表的槽：

int dictRehash(dict *d, int n) {
    if (!dictIsRehashing(d)) return 0;

    while(n--) {
        dictEntry *de, *nextde;

        **//首先检测rehash过程是否已经完成，如果是则哈希表0替换为hash表1**
        if (d->ht[0].used == 0) {
            zfree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            d->rehashidx = -1;
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */

         **//找到第一个不为空得槽，**
        assert(d->ht[0].size > (unsigned)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
        de = d->ht[0].table[d->rehashidx];
        **//将这个槽的旧数据全部移动到1号hash表的槽中**
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }
    return 1;
}

4、active rehashing（每次100个槽，但是占用cpu的时间不能超过1ms）

active发生在timeEvent事件中，在timeEvent中，事件函数是serverCron() ，redis中时间事件只注册一个。
在此函数中主要做一下工作：
(1)过期key的收集工作，收集的方式和rehash方式一样，分为active和lazy collection.
(2)软件狗–这地方还没明白意思，后面再看
(3) update 一些静态数据
(4)rehash 哈希表，也就是上面介绍的
(5)触发BGSAVE / AOF读写，以及处理中断的子进程，BGSAVE / AOF进程主要是数据持久化的操作，后面针对这两个再分别写一篇文章
(6) 处理不同类型的客户端超时操作
(7) Replication reconnection 应该是和集群操作有关，后面会专门看redis的集群操作

active reshing的操作主要在此函数中

void databasesCron(void) {
    /* Expire keys by random sampling. Not required for slaves
     * as master will synthesize DELs for us. */
    if (server.active_expire_enabled && server.masterhost == NULL)
        activeExpireCycle();

    /* Perform hash tables rehashing if needed, but only if there are no
     * other processes saving the DB on disk. Otherwise rehashing is bad
     * as will cause a lot of copy-on-write of memory pages. */
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
        /* We use global counters so if we stop the computation at a given
         * DB we'll be able to start from the successive in the next
         * cron loop iteration. */
        static unsigned int resize_db = 0;
        static unsigned int rehash_db = 0;
        unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL;
        unsigned int j;

        /* Don't test more DBs than we have. */
        if (dbs_per_call > (unsigned)server.dbnum) dbs_per_call = server.dbnum;

        /* Resize */
        //缩小hash表
        for (j = 0; j < dbs_per_call; j++) {
            tryResizeHashTables(resize_db % server.dbnum);
            resize_db++;
        }

        /* Rehash */
        rehash数据表
        if (server.activerehashing) {
            for (j = 0; j < dbs_per_call; j++) {
            **//同样考虑到性能考虑，给rehash操作的占用cpu的时间职位1毫秒**
                int work_done = incrementallyRehash(rehash_db % server.dbnum);
                rehash_db++;
                if (work_done) {
                    /* If the function did some work, stop here, we'll do
                     * more at the next cron loop. */
                    break;
                }
            }
        }
    }
}

同样考虑到性能考虑，给rehash操作的占用cpu的时间职位1毫秒，见下面,下面也就是每次 dictRehash 100个槽，如果while循环过程中，时间超过1ms,那么直接退出循环，进行其他数据。

/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds();
    int rehashes = 0;

    while(dictRehash(d,100)) {
        rehashes += 100;
        if (timeInMilliseconds()-start > ms) break;
    }
    return rehashes;
}

5、总结

这篇文章主要介绍了redis的resize和rehash 哈希表的过程，redis为了兼顾性能的考虑，分为lazy和active的两种rehash操作，同时进行，直到rehash完成。

下一节介绍过期key的回收处理机制