SDS dynamic string parsing source Redis

Author: Pushy
This link: http://pushy.site/2019/12/21/redis-sds/
Disclaimer: All articles in this blog unless otherwise specified, are used CC BY-NC-SA 3.0 license. Please indicate the source!

1. What is the SDS

As we all know, between five deconstruction Redis data, the simplest is the string:

redis> set msg "Hello World"

Redis string and does not directly represent a conventional C language, but had built called simple dynamic string abstract data structures (Simple dynamic string, i.e. SDS) is.

Redis implementation of the above command, in the Server database to create a key pair, namely:

  • Bond is "msg" of SDS;
  • Value "Hello World" of SDS.

Let us look at the definition of the SDS, Redis source directory in sds.hthe header file that defines the structure of the SDS:

struct sdshdr {
    // 记录 buf 数组中当前已使用的字节数量
    unsigned int len;
    // 记录 buf 数组中空闲空间长度
    unsigned int free;
    // 字节数组
    char buf[];

};

It can be seen, by SDS len, and freedescribe the attributes of the value of the byte array bufcurrent storage state, so that there is a significant role in the expansion and after the other operations, but also to the complexity of O (1) of the acquired length of the string (we know, C comes with the string itself does not record length, only to traverse the entire string of statistics) .

So why Redis to achieve their own set of string data deconstruct it? To study under slowly!

2. SDS advantage

To prevent buffer overflow

In addition to obtaining the string length of complexity is higher than, C string does not record another problem caused by its own length information is likely to cause memory overflow . For example, the built-in C by strcatthe method string mottoappended to s1the string back:

void wrong_strcat() {
    char *s1, *s2;

    s1 = malloc(5 * sizeof(char));
    strcpy(s1, "Hello");
    s2 = malloc(5 * sizeof(char));
    strcpy(s2, "World");

    char *motto = " To be or not to be, this is a question.";
    s1 = strcat(s1, motto);

    printf("s1 = %s \n", s1);
    printf("s2 = %s \n", s2);
}

// s1 = Hello To be or not to be, this is a question. 
// s2 = s a question. 

But the output unexpectedly, we only want to modify the s1value of the string, and s2the string also been modified. This is because the strcatmethod assumes that the user has to perform before s1enough memory, and can accommodate mottothe contents of the string. Once this assumption does not hold, it will generate a buffer overflow .

We see that by Debug, s1 variable initial position memory 94458843619936(10 decimal), S2 as the initial position 94458843619968, is adjacent to the section of memory block:

wrong_strcat.png

So once through strcat the length of the string is added to the motto s1 to s2 is larger than s1 memory address interval, it will be modified to a value of variable s2 . The correct approach should be to strcatre-adjust the size of the memory before s1, s2 so as not to modify the value of a variable:

void correct_strcat() {
    char *s1, *s2;

    s1 = malloc(5 * sizeof(char));
    strcpy(s1, "Hello");
    s2 = malloc(5 * sizeof(char));
    strcpy(s2, "World");

    char *motto = " To be or not to be, this is a question.";
    // 为 s1 变量扩展内存,扩展的内存大小为 motto * sizeof(char) + 空字符结尾(1)
    s1 = realloc(s1, (strlen(motto) * sizeof(char)) + 1);
    s1 = strcat(s1, motto);

    printf("s1 = %s \n", s1);
    printf("s2 = %s \n", s2);
}

// s1 = Hello To be or not to be, this is a question. 
// s2 = World 

Can be seen, the expansion starting position s1 variable becomes the memory address 94806242149024(in decimal), the start address S2 94806242148992. This time interval size s1 and s2 is sufficient to store the memory address of the string motto:

correct_strcat.png

With the C string different, SDS has space allocation policies completely eliminate the possibility of buffer overflow , the specific implementation in sds.cthe. By reading the source code, we can understand the reason why the SDS can prevent buffer overflow because the recall sdsMakeRoomFor, the SDS will check whether the space to meet the requirements (ie required to modify free >= addlenconditions), if satisfied Redis will be extended to the implementation of SDS space desired size, in performing the actual concat operation, thus avoiding the overflow occurs:

// 与 C 语言 string.h/strcat 功能类似,其将一个 C 字符串追加到 sds
sds sdscat(sds s, const char *t) {
    return sdscatlen(s, t, strlen(t));
}

sds sdscatlen(sds s, const char *t, size_t len) {
    struct sdshdr *sh;
    size_t curlen = sdslen(s);  // 获取 sds 的 len 属性值

    s = sdsMakeRoomFor(s, len);
    if (s == NULL) return NULL;
    // 将 sds 转换为 sdshdr,下边会介绍
    sh = (void *) (s - sizeof(struct sdshdr));
    // 将字符串 t 复制到以 s+curlen 开始的内存地址空间
    memcpy(s + curlen, t, len);
    sh->len = curlen + len;     // concat后的长度 = 原先的长度 + len
    sh->free = sh->free - len;  // concat后的free = 原来 free 空间大小 - len
    s[curlen + len] = '\0';     // 与 C 字符串一样,都是以空字符 \0 结尾
    return s;
}

// 确保有足够的空间容纳加入的 C 字符串, 并且还会分配额外的未使用空间
// 这样就杜绝了发生缓冲区溢出的可能性
sds sdsMakeRoomFor(sds s, size_t addlen) {
    struct sdshdr *sh, *newsh;
    size_t free = sdsavail(s);  // 当前 free 空间大小
    size_t len, newlen;

    if (free >= addlen) {
        /* 如果空余空间足够容纳加入的 C 字符串大小, 则直接返回, 否则将执行下边的代码进行扩展 buf 字节数组 */
        return s;
    }
    len = sdslen(s);  // 当前已使用的字节数量
    sh = (void *) (s - (sizeof(struct sdshdr)));
    newlen = (len + addlen);  // 拼接后新的字节长度

    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;
    newsh = realloc(sh, sizeof(struct sdshdr) + newlen + 1);
    if (newsh == NULL) return NULL; // 申请内存失败

    /* 新的 sds 的空余空间 = 新的大小 - 拼接的 C 字符串大小 */
    newsh->free = newlen - len;
    return newsh->buf;
}

In addition, when I look at the source code to sh = (void *) (s - sizeof(struct sdshdr));look ignorant force, I do not know if you can see: Redis (a) of sdshdr struct SH = (void ) (S-(sizeof (struct sdshdr))) explain

Reduce the number of memory reallocation modify the character brought

For C N characters comprising the string, the bottom layer is always N + 1 is implemented by an array of contiguous memory . Because of this relationship, thus modified each time, the program needs the C string array to a memory reallocation operation:

  • If the splicing operation is: to extend the size of the underlying array, to prevent buffer overflows (mentioned earlier);
  • If truncation: need to free up unused memory, prevent memory leaks .

Redis as a database to be accessed frequently modified, in order to reduce the performance impact modified character brought heavy memory allocation, SDS has become very necessary. Because in the SDS, the length of the array is not necessarily buf + 1 number string, a character may contain unused attribute value recorded by free . By unused space, SDS achieve the following two optimization strategies:

Ⅰ, pre-allocated space

Space pre-allocated for optimizing the operation SDS Growth: When SDS be modified, and the need for spatial expansion when the SDS, Redis will not only modify the allocation of the necessary space for the SDS, will allocate additional unused space on the SDS .

In front sdsMakeRoomForyou can see the method, there are two strategies additional amount of unused space allocated:

  • SDS less than SDS_MAX_PREALLOC: len time attribute value will be equal and the free properties;
  • SDS than or equal to SDS_MAX_PREALLOC: direct allocation SDS_MAX_PREALLOCsize.
sds sdsMakeRoomFor(sds s, const char *t, size_t len) {
    ...
    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;
    newsh = realloc(sh, sizeof(struct sdshdr) + newlen + 1);
    if (newsh == NULL) return NULL;
    newsh->free = newlen - len;
    return newsh->buf;
}

By pre-allocating space strategy, Redis memory allocation can reduce the number of required re-growth of the string operation is continuously performed.

Ⅱ, an inert space is released

SDS inertized room for optimizing the string release operation to shorten, when SDS is necessary to shorten strings stored, the Redis memory reallocation is not immediately recovered by shortening the extra bytes, but the use of these free bytes recorded up property and wait to use .

For example, we see perform complete sdstrimand immediate recovery not release extra 22 bytes of space, but is saved by the free variable values. When executed sdscat, the previously freed space 22 bytes 11 bytes in size sufficient to accommodate the additional character string C, there is no further reallocation of memory expansion.

#include "src/sds.h"

int main() {
    // sds{len = 32, free = 0, buf = "AA...AA.a.aa.aHelloWorld     :::"}
    s = sdsnew("AA...AA.a.aa.aHelloWorld     :::");  
    // sds{len = 10, free = 22, buf = "HelloWorld"}
    s = sdstrim(s, "Aa. :");  
    // sds{len = 21, free = 11, buf = "HelloWorld! I'm Redis"}
    s = sdscat(s, "! I'm Redis");   
    return 0;
}

Release strategy by an inert space, SDS avoid operating heavy memory allocation shorten the time required for a string, and will grow in the future may provide optimized operation. At the same time, SDS also has a corresponding API actually release the unused space SDS.

Binary Security

C string must meet certain codes, and in addition to the end of the string, the string can not contain null character ( \0), otherwise they will be mistaken for the end of the string. These limitations cause can not save pictures, audio, etc. This binary data.

However Redis can store binary data, because SDS is used as attribute values ​​rather than len null character to determine whether the end of the string.

Part C string functions compatible

We found that the byte array SDS and C strings are similarities, for example, it is to \0end (but not in this flag as the end of the string). This makes the SDS can reuse <string.h>library definitions:

#include <stdio.h>
#include <strings.h>
#include "src/sds.h"

int main() {
    s = sdsnew("Cat");
    // 根据字符集比较大小
    int ret = strcasecmp(s, "Dog");
    printf("%d", ret);
    return 0;
}

3. Summary

SDS implementation reading Redis, and finally know Redis just so fast, certainly epoll and network I / O model inseparable, but also simple and the underlying data structure optimized inseparable.

SDS subtlety in that the expansion and scalability to coordinate array by len bytes free and attribute values, brings better performance than over advantages too string C. What is Niubi? It is called rocks!

Guess you like

Origin www.cnblogs.com/Pushy/p/12081020.html