[Redis study notes (2)] the dictionary structure in detail

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !


Old Rules-Sister Town House:

One. dictionary

(I. Overview

        An abstract data structure used to store key-value pairs. A key can be associated with a value, and each key is unique. Since there is no built-in dictionary in C language, Redis has built its own dictionary implementation, which is also widely used in Redis. For example, Redis database is implemented by using dictionary as the bottom layer, as well as one of the bottom layer implementations of hash keys, etc. .

(Two) the realization of the dictionary

        Redis's dictionary uses a hash table as the underlying implementation. There can be multiple hash table nodes in a hash table, and each hash table node stores a key-value pair in the dictionary.

1. Hash table

        The hash table used by the Redis dictionary is defined by the dictht structure:

typedef struct dictht{
    
    
	dictEntry **table;
	unsigned long size;
	unsigned long sizemask;
	unsigned long used;
}dictht;

        The table attribute is an array, and each element is a pointer to a dictEntry structure, which is a hash table node and holds a key-value pair;

        The size attribute records the size of the hash table, that is, the size of the table array;

        used records the number of existing nodes in the hash table;

        The value of the sizemask attribute is equal to size-1, which is used together with the hash value to determine which index in the table array a key should be placed on, as explained later

2. Hash table node

        The hash table node is represented by the dictEntry structure, which holds a key-value pair.

typedef struct dictEntry{
    
    
	void *key;
	union{
    
    
	void *val;
	uint64_t u64;
	int64_t s64;
	}v;
	struct dictEntry *next;
}dictEntry;

        The key attribute is the key in the key-value pair;

        The v attribute is the value in the key-value pair, which can be a pointer or an integer, that is, any one of the union;

        The next attribute is a pointer to another hash table node, which can link multiple key-value pairs with the same hash value together, causing the problem of key conflicts.

3. Dictionaries

        The dictionary is implemented by the dict structure

typedef struct dict{
    
    
	dictType *type;
	void *privdata;
	dictht ht[2];
	int trehashidx;
}dic;

        The type attribute and the privdata attribute are set for different types of key-value pairs to create a polymorphic dictionary;

        The ht attribute is an array containing two items. Each item is a dictht hash table. Generally, the dictionary uses the ht[0] hash table, and ht[1] is only used when rehashing ht[0]. ;

        trehashidx records the current progress of rehash, if no rehash is performed, it is -1


(3) Hash algorithm

        When adding a new key-value pair to the dictionary, first calculate the hash value and index value according to the key, and then according to the index value, put the hash table node containing the new key-value pair into the hash table array The specified index above.

        The method of Redis to calculate the hash value is dict->type->hashFunction(key). After calculating the hash value, the index value is calculated according to the sizemask attribute and hash value of the hash table:

index = hash & dict->ht[x].sizemask;

        ht[x] can be ht[0] or ht[1]. If there is no rehash, it will be in ht[0]. If rehash is in progress, the new node will be placed in ht[1], and the old node will still be in ht[1]. ht[0].

        After the index value is calculated, the index value is where the hash table node should be placed in the hash table array. If it is 0, it is placed in the position of ht[0][0]. The MurmurHash2 algorithm is used in Redis to calculate the hash value. .


(4) Key conflict problem

        When two or more keys are assigned to the same index in the hash table array, it is said that these keys conflict. Redis hash table uses chain address method to solve the key conflict problem, multiple hash table nodes Use the next pointer to form a singly linked list. Because the linked list composed of dictEntry nodes does not have a pointer to the end of the linked list, for performance, a new hash table node is added by the head interpolation method.

(5) rehash

        When there are too many or too few key-value pairs stored in the hash table, the program needs to expand and shrink the size of the hash table accordingly, and complete it through rehash.

        1. Allocating space for the ht[1] hash table of the dictionary depends on the operation to be performed and the number of key-value pairs currently contained in ht[0]. If it is an expansion operation, ht[1] is the first n-th power greater than or equal to 2 times the number of key-value pairs; if it is a shrinking operation, then the size of ht[1] is the first key-value pair greater than or equal to The number is raised to the power of 2.

        2. Rehash all key-value pairs in ht[0] to ht[1], which means to recalculate the hash value and index of the key, and put the key-value pairs in the corresponding positions in ht[1].

        3. After completing all migrations, release ht[0], set ht[1] to ht[0], and create a new blank hash table at ht[1].

        The load factor of the hash table = the number of nodes stored in the hash table / the size of the hash table

        When the load factor of the hash table is less than 0.1, the shrink operation is automatically performed.

        The entire rehash operation is not done centrally, but is done in multiple times and gradually, because for large-scale hash tables, the amount of calculation is too much. All the calculations required by rehash are allocated to the addition, deletion, modification, and query operation of the dictionary. The increment operation is only in ht[1], and other operations are performed simultaneously in the two hash tables. The progressive rehash steps are as follows:

        1. Allocate space for ht[1] so that the dictionary holds both ht[0] and ht[1];

        2. Maintain an index counter variable trehashidx in the dictionary, set to 0, which means start rehash;

        3. Every time you add, delete, modify, and check the dictionary, in addition to performing the specified operation, all the key-value pairs of ht[0] in the trehashidx index will be rehashed to ht[1];

        4. All migration of ht[0] is completed, and trehashidx is set to -1;

Guess you like

Origin blog.csdn.net/Mrwxxxx/article/details/113874602