Redis String: SDS

Summarize the implementation of Redis encapsulating C strings as SDS.

SDS structure

structure definition

The full name of SDS is Simple Dynamic String (simple dynamic string), which is the encapsulation of C native string by Redis. The structure definition is as follows:

// sds 是 char * 的类型别名,用于指向 sdshdr 头部的 buf 字符串
typedef char *sds;

// Redis 保存字符串对象的结构
struct sdshdr {
    
    
	int len;    // buf 已占用空间长度
	int free;   // buf 剩余可用空间长度
	char buf[]; // sds 二进制字节数组,C99 支持将 struct 最后一个成员定义为无长度数组,不自动分配内存
};
  • len: signed inttype, occupying 4 bytes, and can represent 2^31 B= 2^21 KB= 2^11 MB= 2 GBlarge data at most. But Redis limits the longest single-key string value to 512MB. Reference: maximum-value-size-in-redis

  • buf: A flexible array with no length at the time of declaration, which is an incomplete type in the C99 standard. Although the fields in the structure are continuous in memory, the flexible array space is not included in the total memory of the structure:

    printf("%zu\n", sizeof(struct sdshdr)); // 8 
    

memory layout

Assuming that the string "Redis" is stored, its memory layout is as follows:

Verification of the length of each memory segment on 64-bit Linux:

int main() {
    
    
	char *sds = sdsnew("Redis");
	char *free = sds - sizeof(int);
	char *len = free - sizeof(int);
	char *prefixSize = len - sizeof(size_t);
	printf("used_memory: %zu\n", zmalloc_used_memory());
	printf("prefix_size: %d\nlen: %d\nfree: %d\n", *prefixSize, *len, *free);
}

SDS API implementation

Source code: sds.c , several important API implementations:

sdslen

O(1) complexity returns the string length. Directly ask SDS to shift left by 2 int length addressing, and return after reading the length of the string:

static inline size_t sdslen(const sds s) {
    
    
	struct sdshdr *sh = (void *) (s - (sizeof(struct sdshdr))); // sds - 8
	return sh->len;
}

sdsnew

Note that zmallocno memory is allocated for the buf flexible array, whose values ​​are memcpyinitialized with values ​​that efficiently copy strings:

// 新建 sds
sds sdsnew(const char *init) {
    
    
	size_t initlen = (init == NULL) ? 0 : strlen(init);
	return sdsnewlen(init, initlen);
}

// 根据字符串 init 及其长度 initlen 创建 sds
// 成功则返回 sdshdr 地址,失败返回 NULL
sds sdsnewlen(const void *init, size_t initlen) {
    
    
	struct sdshdr *sh;
	if (init) {
    
    
		sh = zmalloc(sizeof(struct sdshdr) + initlen + 1); // 有值则不初始化内存,+1 是为 '\0' 预留
	} else {
    
    
		sh = zcalloc(sizeof(struct sdshdr) + initlen + 1); // 空字符串则 SDS 初始化为全零
	}
	if (sh == NULL) return NULL;

	sh->len = initlen;
	sh->free = 0; // 新 sds 不预留空闲空间
	if (initlen && init)
		memcpy(sh->buf, init, initlen); // 复制字符串 init 到 buf

	sh->buf[initlen] = '\0'; // 以 \0 结尾
	return (char *) sh->buf; // buf 部分即 sds
}

sdsclear

Lazy delete, clear SDS to an empty string, and the unreleased space will be reserved for the next allocation:

void sdsclear(sds s) {
    
    
	struct sdshdr *sh = (void *) (s - (sizeof(struct sdshdr)));
	sh->free += sh->len; // 全部可用
	sh->len = 0;
	sh->buf[0] = '\0'; // 手动截断 buf
}

sdsMakeRoomFor

Redis's memory pre-allocation strategy is determined according to the number of new memory bytes:

  • [0, 1 MB): Double the growth
  • [1, ∞): only grows by 1 MB each time
// 扩展 sds 空间增加 addlen 长度,进行内存预分配
sds sdsMakeRoomFor(sds s, size_t addlen) {
    
    

	struct sdshdr *sh, *newsh;
	size_t free = sdsavail(s);
	size_t len, newlen;

	if (free >= addlen) return s; // sdsclear 惰性删除保留的内存够用,无须扩展

	len = sdslen(s);
	sh = (void *) (s - (sizeof(struct sdshdr)));

	newlen = (len + addlen); // 新长度不把 free 算入,和初始化时一样恰好够用就行

	// 空间预分配策略:新长度在 (..., 1MB) 则成倍增长,[1MB, ...) 则每次仅增长 1 MB
	if (newlen < SDS_MAX_PREALLOC)
		newlen *= 2;
	else
		newlen += SDS_MAX_PREALLOC;

	// 重分配
	newsh = zrealloc(sh, sizeof(struct sdshdr) + newlen + 1);
	if (newsh == NULL) return NULL;

	newsh->free = newlen - len; // 更新 free 但不更新 len
	return newsh->buf;
}

sdscat

Use to memcpyefficiently copy memory, concatenate strings to the end of SDS, which uses sdsMakeRoomForto pre-allocate space:

// 将长度为 len 的字符串 t 追加到 sds
sds sdscatlen(sds s, const void *t, size_t len) {
    
    

	struct sdshdr *sh;
	size_t curlen = sdslen(s);
	s = sdsMakeRoomFor(s, len);
	if (s == NULL) return NULL;

	sh = (void *) (s - (sizeof(struct sdshdr)));
	memcpy(s + curlen, t, len);  // 复制 t 中的内容到字符串后部

	sh->len = curlen + len;
	sh->free = sh->free - len;
	s[curlen + len] = '\0';

	return s;
}

// 追加字符串到 sds
sds sdscat(sds s, const char *t) {
    
    
	return sdscatlen(s, t, strlen(t));
}

Note that similar sdscpyfunctions overwrite copy strings into SDS.

sdsfree

Release the entire memory of SDS:

void sdsfree(sds s) {
    
    
	if (s == NULL) return;
	zfree(s - sizeof(struct sdshdr)); // 同样左移寻址
}

Advantages of SDS

Combined with the above API implementation, summarize the four advantages of SDS compared to C native strings:

O(1) complexity to get string length

\0A C string is a character array whose last element is , and to obtain the length requires O(N) traversal from the beginning to the end.

In the SDS structure, the length of the string is recorded in the len field, and its length is dynamically maintained in various addition and deletion operations, and the strlenfield value can be read directly by using .

avoid buffer overflow

strcpyIf the C string operation dstdoes not allocate enough memory, the application may crash or be attacked by buffer overflow.

SDS will check the space before operating the string, and pre-allocate if it is not enough, so as to prevent overflow from the root cause.

binary security

C is used '\0'as a string delimiter, so data interspersed with a large number of null characters such as pictures cannot be saved.

SDS API uses the length of len to define the character string boundaries, and whatever is stored can be retrieved, so it is safe to operate binary data. However, SDS is also used '\0'as a string delimiter, which is convenient for direct reuse string.hof rich library functions in .

Memory preallocation and lazy release

Every time a C string grows or shrinks, realloc must reallocate memory.

SDS expands the space by doubling or adding 1MB each time, and does not release the memory when it is cleared, and reserves it for next use. Thereby reducing N string operations and memory reallocation times from a certain number of N times to a maximum of N times.

Summarize

Redis encapsulates C native strings as SDS, and implements APIs such as length fetching, copying, comparison, and memory preallocation for use by the upper layer. You can see that operations such as bufdirect memory copying in the API implementation are very efficient.

Because of encapsulation, SDS has a layer of address fetching and other operations compared to native strings, but its API time-consuming has not become the performance bottleneck of Redis, and the design is very delicate.

Guess you like

Origin blog.csdn.net/qq_24694139/article/details/131718828