Redis-data structure

参考资料Geek Time Redis (Yafeng)

Redis data structure

SDS

sds(Simple Dynamic String) string connection structure:

// flags 取值类型如下 数字表示字节数
#define SDS_TYPE_5 0
#define SDS_TYPE_8 1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
struct --attribute_- ((-_packed__)) sdshdr8{
    
    
	/* buf已保祥的字符串字节数,不包含结束标示*/
	uint8_t len;
	/* buf申请的总的字节数,不包含结束标示*/
	uint8_t alloc;
	/*不同SDS的头类型,⽤来控制SDS的头⼤⼩*/
	unsigned char flags;
	char buf [];
}

Insert image description here

For example, execute a command

set name liyong #这个redis创建了两个SDS 分别进行存储

Redis SDS 扩容机制
If we want to append a string, Amy, to SDS, we will first apply for new memory space:
• If the new string is less than 1M, the new space will be twice the length of the extended string + 1;
• If If the new string is larger than 1M, the new space is the extended string length + 1M + 1. This is called memory pre-
allocation.
特别注意::Based on the above theory, if we use Redis to directly store strings in production, it is best not to exceed 1M, otherwise it may cause a waste of space.

IntSet

IntSet is an implementation method of set collection in Redis. It is implemented based on integer array and has the characteristics of variable length and ordering.
IntSet structure

/* 2字节整数,类似java的short*/
#define INTSET_ENC_INT16 (sizeof(int16_t)
/*4字节整数,类似java的int */
#define INTSET_ENC_INT32 (sizeof(int32_t)
/* 8字节整数,类似java的Long */
#define INTSET_ ENC_INT64 (sizeof(int64_t)
typedef struct intset {
    
    
	 /*编码⽅式,⽀持存放16位、32位、64位整数*/
	uint32_t encoding;
	/* 元素个数 */
	uint32_t Length;
	/*整数数组,保存集合数据*/
	int8_t contents []}

Insert image description here

In order to facilitate search, Redis saves all the integers in the intset in ascending order in the contents array. For example, if we store 1,2,3,4, then the encoding is 16, which is enough to store. If we insert 50000, then signed 16 bits are not enough to store. At this time, the capacity will be expanded in reverse order and stored, that is, starting from the last data. After copying, the final result is that each data is expanded to 32 bits. If it is less than 0, it is inserted at the front. If it is greater than 0, it is inserted at the end. If it is a number in the middle, it is inserted by dichotomy.

DICT

We know that Redis is a key-value database, and we can quickly add, delete, modify and query based on the key. The mapping relationship between keys and values ​​is
implemented through Dict.
Dict consists of three parts, namely: DictHashTable (HashMap analogous to java), DictEntry (Map.entry), Dict

// 这个数据类型相当于是Hash槽上的每个槽 基于这个槽往外面拉链
typedef struct dictht {
    
    
	// entry数组 数组中保存的是指向entry的指针
	dictEntry **table;
	//哈希表⼤⼩
	unsigned Long size;
	//哈希表⼤⼩的掩码,总等于size - 1  h % size == h & size - 1 的前提是size是2的倍数 引出面试题hashmap 为什么扩容是2倍扩容
	unsigned Long sizemask;
	// entry个数
	unsigned Long used;
}
// 真正存储数据的节点
typedef struct dictEntry {
    
    
	void *key;//键
	union {
    
    
	    // 值可能是4中类型之一
		void *val;
		uint64_t u64;
		int64_t s64;
		double d;
	} v;//值
	//下⼀个Entry的指针
	struct dictEntry *next;
}

Insert image description here

// 此数据结构用于扩容
typedef struct dict {
    
    
	dictType *type;//dict类型,内置不同的hash函数
	void *privdata; // 私有数据,在做特殊hash运算时⽤
	dictht ht[2]//⼀个Dict包含两个哈希表,其中⼀个是当前数据,另⼀个⼀般是空,rehash时使⽤
	Long rehashidx;
	//rehash的进度,-1表示未进⾏
	int16_t pauserehash;// rehash是否暂停,1则暂停,0则继续
}

Insert image description here
When we add a key-value pair to Dict, Redis first calculates the hash value (h) based on the key, and then uses h& sizemask to calculate which index position in the array the element should be stored in.
The HashTable in Dict is an implementation of an array combined with a one-way linked list. When there are many elements in the collection, it will inevitably lead to an increase in hash conflicts. If the linked list is too long, the efficiency will be greatly reduced. Dict will check the load factor (LoadFactor =used/size) every time a new key-value pair is added. used indicates the number of entries, which is the total number of data nodes, and size is the size of the hash table. When the following two conditions are met Hash table expansion will be triggered:
• LoadFactor of the hash table >= 1, and the server does not execute processes such as SAVE or REWRITEAOF
• LoadFactor of the hash table >5;
expansion logic

static int _dictExpandI fNeeded (dict *d){
    
    
	//如果正在rehash,则返回ok
	if (dictisRehashing(d)) return DIcT.oK;
	//如果哈希表为空,则初始化哈希表为4
	if (d->ht[0].size == 0) return dictExpand (d, 4)// 当负载因⼦(used/size)达到1以上,并且当前没有进⾏rewriteaof等⼦进程操作
	//或者负载因⼦超过5,则进⾏ dictExpand,也就是扩容
	if (d->ht[0].used >= d->ht[0].size &&dict_can_resize Il d->ht [0] .used/d->ht 
	[0] .size > 5){
    
    
	//扩容⼤⼩为used +1,底层会对扩容⼤⼩做判断,实际上找的是第⼀个⼤于等于 
	used+12n
	return dictExpand(d, d->ht[0].used + 1)}
return DICT_OK;
}

In addition to expansion of Dict, every time an element is deleted, the load factor will also be checked. When LoadFactor <0.1, the hash table will be shrunk:

// 判断是否需要缩容 染出操作成功的时候检查
int htNeedsResize(dict *dict) {
    
    
	//哈希表⼤⼩
	size = dictSlots(dict)// entry数量
	used = dictSize(dict)// 负载因⼦低于0.1
	return (size > 4 &&(used*100/size < 10))}
// 收缩逻辑
int dictResize(dict *d){
    
    
	unsigned Long minimal;
	//如果正在做save或rewriteaof或rehash,则返回错误
	if (!dict_can_resize || dictIsRehashing (d))
	return DICT_ERR;
	// 获取used,也就是entry个数
	minimal = d->ht[0].used;
	//如果used⼩于4,则重置为4
	if (minimal < 4) minimal = 4;
	// 重置⼤⼩为minimal,其实是第⼀个⼤于等于minimal的2n
	return dictExpand(d, minimal)}

Whether it is expansion or contraction, a new hash table will be created, causing the size and sizemask of the hash table to change, and the query of the key is related to the sizemask. Therefore, the index must be recalculated for each key in the hash table and inserted into the new hash table. This process is called rehash. The process is as follows:
1 Calculate the realsize of the new hash table, that is, the first one is greater than or equal to the nth power of 2 of dict.ht[0].used.
2 Apply for memory space according to the new realsize, create dicttht, and assign it to dict.ht[1].
3 Set rehashidx= 0 to indicate the start of rehash.
4 Rehash each dictEntry in dict.ht[0] to dict.ht[1]. Every time you perform a new addition, query, modification, or deletion operation, check whether dict.rehasidx is greater than -1 , if so, rehash the entry list of ht[0].table[rehashidx] to dict.ht[1] (the logic here is to move all the data in this slot every time, not the entire data (all slots) Move over) and rehashedx++. All data up to dict.ht[0] is rehash to dict.ht[1]. During the transfer process, the data on this linked list may be moved to other locations. Satisfy a principle: the element must either remain in its original position or be moved to the position of oldsize + index.
5 Assign dict.ht[1] to dict.ht[0], and initialize dict.ht[1] to an empty hash table.
6 Assign the value of rehasidx to -1, which means the rehash is completed.
7 During the rehash process, new operations will be written directly into ht[1], and queries, modifications, and deletions will be searched and executed in dict.ht[0] and dict.ht[1] in sequence. This ensures that the data of h[0] only decreases but does not increase, and will eventually become empty with rehash.

ZipList

生成环境中最多存储几千的数据,数据多性能会下降
ZipList is a special double-ended linked list, consisting of a series of specially encoded contiguous memory blocks. The push/pop operation can be performed at either end, and the time complexity of this operation is 0(1).
Insert image description here
Insert image description here
The Entry in ZipList does not record the pointers of the previous and next nodes like an ordinary linked list, because recording two pointers takes up 16 bytes, which is a waste of memory. Instead, the following structure is used:
Insert image description here
previous_entry_length: the byte size of the previous node, accounting for 1 or 5 bytes. If the length of the previous node is less than 254 bytes, 1 byte is used to save this length value. If the length of the previous node is greater than 254 bytes, 5 bytes are used to save the length value. The first byte is 0xfe, and the last 4 bytes are the real length data.
encoding: Encoding attribute, records the data type of content (string or integer) and the number of bytes of content, occupying 1, 2 or 5 bytes. If the encoding starts with "00", "01" or "10", it proves that the content is a string. If the encoding starts with "11", it proves that the content is an integer, and the encoding occupies 1 byte.
Content: responsible for saving node data, which can be a string or an integer

ZipList这种特殊情况下产⽣的连续多次空间扩展操作称之为连锁更新(Cascade Update)。新增、删除都可能导致连锁更新的发⽣。这种情况的发生就是因为254字节这个临界点,当前节点以后的数据都要被更新,因为大于254长度会用5个字节来存,所以不推荐存大量的数据存储

QuickList

Question 1: Although ZipList saves memory, the memory applied for must be continuous space. If it takes up a lot of memory, the memory application efficiency is very low. what to do?
Answer: Limit the length and entry size of ZipList.
Question 2: But what should we do if we want to store a large amount of data that exceeds the optimal upper limit of ZipList?
Answer: You can create multiple ZipLists to store data in fragments.
Question 3: After the data is split, it is relatively scattered and inconvenient to manage and search. How to establish a relationship between these multiple ZipLists?
Answer: Use QuickList, which is a double-ended linked list, but each node in the linked list is a ZipList.
QuickList structure

typedef struct quicklist{
    
    
	quicklistNode *head;
	quicklistNode *tail;
	// 所有ziplist的entry的数量
	unsigned Long count;
	// ziplists总数量
	unsigned Long Len;
	// ziplist的entry上限,默认值 -2
	int fill: -2// ⾸尾不压缩的节点数量
	unsigned int compress: 0 
}
typedef struct quicklistNode 
{
    
    
	struct quicklistNode *prev;
	struct quicklistNode *next;
	//当前节点的ZipList指针
	unsigned char *zl;
	// 当前节点的ZipList的字节⼤⼩
	unsigned int sz;
	//当前节点的ZipList的entry个数
	unsigned int count: 16//编码⽅式:1,ZipList;2,压缩模式
	unsigned int encoding: 2// 是否被解压缩。1:则说明被解压了,将来要重新压缩
	unsigned int recompress : 1 
}

In order to avoid too many entries in each ZipList in QuickList, Redis provides a configuration item: list-max-ziplist-size to limit.

  • If the value is positive, it represents the maximum number of entries allowed in ZipList.
  • If the value is negative, it represents the maximum memory size of ZipList, which can be divided into 5 situations:
    ① -1: The memory usage of each ZipList cannot exceed 4kb
    ② -2: The memory usage of each ZipList cannot exceed 8kb
    ③ - 3: The memory usage of each ZipList cannot exceed 16kb
    ④ -4: The memory usage of each ZipList cannot exceed 32kb
    ⑤ -5: The memory usage of each ZipList cannot exceed 64kb

In addition to controlling the size of zipList, QuickList can also compress the ZipList of nodes. Controlled by the configuration item list-compress-depth. Because linked lists are generally accessed more from the beginning to the end, the beginning and end are not compressed. This parameter controls the number of uncompressed nodes at the beginning and end:
0: special value, indicating no compression.
1: indicates that there is 1 node at the beginning and end of the QuickList that is not compressed.
2: indicates that there are 2 nodes at the beginning and end of the QuickList. The nodes are not compressed, and the middle nodes are compressed.
Insert image description here
The middle... represents the compressed nodes.

ShipList

SkipList is first a linked list, but it has several differences compared with traditional linked lists:
• Elements are stored in ascending order
• Nodes may contain multiple pointers with different pointer spans

typedef struct zskiplist {
    
    
	//头尾节点指针
	struct zskiplistNode *header, *tail;
	//节点数量
	unsigned Long Length;
	// 最⼤的索引层级,默认是1
	int Level;
} zskiplist;

typedef struct zskiplistNode {
    
    
	//节点存储的值
	sds ele; 
	// 节点分数,排序、查找⽤
	double score;
	// 前⼀个节点指针
	struct zskiplistNode *backward;
	struct zskiplistLevel {
    
    
	   //下⼀个节点指针
		struct zskiplistNode * forward;
		//索引跨度
		unsigned Long span;
	} level[]//多级索引数组
} zskiplistNode;

Insert image description here

Redis object

Redis further encapsulates the underlying data structure, but its underlying layer is still implemented by the above data structures:
Insert image description here
Insert image description here
Insert image description here
参考资料Geek Time Redis (Yafeng)

Guess you like

Origin blog.csdn.net/qq_43259860/article/details/135033670