Summarize the implementation of Redis encapsulating C strings as SDS.
SDS structure
structure definition
The full name of SDS is Simple Dynamic String (simple dynamic string), which is the encapsulation of C native string by Redis. The structure definition is as follows:
// sds 是 char * 的类型别名,用于指向 sdshdr 头部的 buf 字符串
typedef char *sds;
// Redis 保存字符串对象的结构
struct sdshdr {
int len; // buf 已占用空间长度
int free; // buf 剩余可用空间长度
char buf[]; // sds 二进制字节数组,C99 支持将 struct 最后一个成员定义为无长度数组,不自动分配内存
};
-
len: signed
int
type, occupying 4 bytes, and can represent2^31 B
=2^21 KB
=2^11 MB
=2 GB
large data at most. But Redis limits the longest single-key string value to512MB
. Reference: maximum-value-size-in-redis -
buf: A flexible array with no length at the time of declaration, which is an incomplete type in the C99 standard. Although the fields in the structure are continuous in memory, the flexible array space is not included in the total memory of the structure:
printf("%zu\n", sizeof(struct sdshdr)); // 8
memory layout
Assuming that the string "Redis" is stored, its memory layout is as follows:
Verification of the length of each memory segment on 64-bit Linux:
int main() {
char *sds = sdsnew("Redis");
char *free = sds - sizeof(int);
char *len = free - sizeof(int);
char *prefixSize = len - sizeof(size_t);
printf("used_memory: %zu\n", zmalloc_used_memory());
printf("prefix_size: %d\nlen: %d\nfree: %d\n", *prefixSize, *len, *free);
}
SDS API implementation
Source code: sds.c , several important API implementations:
sdslen
O(1) complexity returns the string length. Directly ask SDS to shift left by 2 int length addressing, and return after reading the length of the string:
static inline size_t sdslen(const sds s) {
struct sdshdr *sh = (void *) (s - (sizeof(struct sdshdr))); // sds - 8
return sh->len;
}
sdsnew
Note that zmalloc
no memory is allocated for the buf flexible array, whose values are memcpy
initialized with values that efficiently copy strings:
// 新建 sds
sds sdsnew(const char *init) {
size_t initlen = (init == NULL) ? 0 : strlen(init);
return sdsnewlen(init, initlen);
}
// 根据字符串 init 及其长度 initlen 创建 sds
// 成功则返回 sdshdr 地址,失败返回 NULL
sds sdsnewlen(const void *init, size_t initlen) {
struct sdshdr *sh;
if (init) {
sh = zmalloc(sizeof(struct sdshdr) + initlen + 1); // 有值则不初始化内存,+1 是为 '\0' 预留
} else {
sh = zcalloc(sizeof(struct sdshdr) + initlen + 1); // 空字符串则 SDS 初始化为全零
}
if (sh == NULL) return NULL;
sh->len = initlen;
sh->free = 0; // 新 sds 不预留空闲空间
if (initlen && init)
memcpy(sh->buf, init, initlen); // 复制字符串 init 到 buf
sh->buf[initlen] = '\0'; // 以 \0 结尾
return (char *) sh->buf; // buf 部分即 sds
}
sdsclear
Lazy delete, clear SDS to an empty string, and the unreleased space will be reserved for the next allocation:
void sdsclear(sds s) {
struct sdshdr *sh = (void *) (s - (sizeof(struct sdshdr)));
sh->free += sh->len; // 全部可用
sh->len = 0;
sh->buf[0] = '\0'; // 手动截断 buf
}
sdsMakeRoomFor
Redis's memory pre-allocation strategy is determined according to the number of new memory bytes:
[0, 1 MB)
: Double the growth[1, ∞)
: only grows by 1 MB each time
// 扩展 sds 空间增加 addlen 长度,进行内存预分配
sds sdsMakeRoomFor(sds s, size_t addlen) {
struct sdshdr *sh, *newsh;
size_t free = sdsavail(s);
size_t len, newlen;
if (free >= addlen) return s; // sdsclear 惰性删除保留的内存够用,无须扩展
len = sdslen(s);
sh = (void *) (s - (sizeof(struct sdshdr)));
newlen = (len + addlen); // 新长度不把 free 算入,和初始化时一样恰好够用就行
// 空间预分配策略:新长度在 (..., 1MB) 则成倍增长,[1MB, ...) 则每次仅增长 1 MB
if (newlen < SDS_MAX_PREALLOC)
newlen *= 2;
else
newlen += SDS_MAX_PREALLOC;
// 重分配
newsh = zrealloc(sh, sizeof(struct sdshdr) + newlen + 1);
if (newsh == NULL) return NULL;
newsh->free = newlen - len; // 更新 free 但不更新 len
return newsh->buf;
}
sdscat
Use to memcpy
efficiently copy memory, concatenate strings to the end of SDS, which uses sdsMakeRoomFor
to pre-allocate space:
// 将长度为 len 的字符串 t 追加到 sds
sds sdscatlen(sds s, const void *t, size_t len) {
struct sdshdr *sh;
size_t curlen = sdslen(s);
s = sdsMakeRoomFor(s, len);
if (s == NULL) return NULL;
sh = (void *) (s - (sizeof(struct sdshdr)));
memcpy(s + curlen, t, len); // 复制 t 中的内容到字符串后部
sh->len = curlen + len;
sh->free = sh->free - len;
s[curlen + len] = '\0';
return s;
}
// 追加字符串到 sds
sds sdscat(sds s, const char *t) {
return sdscatlen(s, t, strlen(t));
}
Note that similar sdscpy
functions overwrite copy strings into SDS.
sdsfree
Release the entire memory of SDS:
void sdsfree(sds s) {
if (s == NULL) return;
zfree(s - sizeof(struct sdshdr)); // 同样左移寻址
}
Advantages of SDS
Combined with the above API implementation, summarize the four advantages of SDS compared to C native strings:
O(1) complexity to get string length
\0
A C string is a character array whose last element is , and to obtain the length requires O(N) traversal from the beginning to the end.
In the SDS structure, the length of the string is recorded in the len field, and its length is dynamically maintained in various addition and deletion operations, and the strlen
field value can be read directly by using .
avoid buffer overflow
strcpy
If the C string operation dst
does not allocate enough memory, the application may crash or be attacked by buffer overflow.
SDS will check the space before operating the string, and pre-allocate if it is not enough, so as to prevent overflow from the root cause.
binary security
C is used '\0'
as a string delimiter, so data interspersed with a large number of null characters such as pictures cannot be saved.
SDS API uses the length of len to define the character string boundaries, and whatever is stored can be retrieved, so it is safe to operate binary data. However, SDS is also used '\0'
as a string delimiter, which is convenient for direct reuse string.h
of rich library functions in .
Memory preallocation and lazy release
Every time a C string grows or shrinks, realloc must reallocate memory.
SDS expands the space by doubling or adding 1MB each time, and does not release the memory when it is cleared, and reserves it for next use. Thereby reducing N string operations and memory reallocation times from a certain number of N times to a maximum of N times.
Summarize
Redis encapsulates C native strings as SDS, and implements APIs such as length fetching, copying, comparison, and memory preallocation for use by the upper layer. You can see that operations such as buf
direct memory copying in the API implementation are very efficient.
Because of encapsulation, SDS has a layer of address fetching and other operations compared to native strings, but its API time-consuming has not become the performance bottleneck of Redis, and the design is very delicate.