redis - Simple Dynamic String SDS

1. Redis has customized a new string mechanism SDS
    Instead of using C's traditional string representation (a null-terminated array of characters) directly, Redis builds itself an abstract type called Simple Dynamic String SDS
2. Simple dynamic string SDS
   SDS Definition:
struct sdshdr {
    // length of occupied space in buf, excluding trailing '\0'
    int len;
    // length of free space left in buf
    int free;
    // Data space, an array of type char
    char buf[];
};

3. The difference between SDS and C strings

1) The complexity of getting the length of the string

    In the C string, the C string is stored in an array (as shown in the figure above), so to obtain the length of the string, it is necessary to traverse the entire array, and the time complexity is O(N);
    In SDS, since a len variable is stored in the SDS structure, the length of the string can be known by directly reading the variable, and the time complexity is O(1)
2) Buffer overflow
    Since C strings need to allocate a size for this string array before saving, if you keep inserting elements into this array, it will cause buffer overflow. Of course, if you want to avoid buffer overflow, you need to check whether the array is full first. If yes, manually allocate a larger space, copy the data, and release the original array;
    In SDS, users only need to add characters to it. It seems that there will be no more buffer overflows. In fact, the operations of buffer overflow checking, reallocating memory, copying data, and releasing the original buffer are handed over to the background for execution. , without the user to perform this part of the work (somewhat similar to the expansion mechanism of vector in STL)
3) Reduce the number of memory reallocations when modifying strings
Flaws of C strings:
    When performing the operation of growing a string, the program needs to expand the space size of the underlying array through memory reallocation first, otherwise a buffer overflow may occur;
    When performing the operation of shortening the string, the program needs to release the part of the space that is no longer used by the string through memory reallocation, otherwise a memory leak may occur
There are two allocation methods for SDS space pre-allocation to determine whether the modified len is greater than 1M:
    If the length of len will be less than 1 M after modification, then the size allocated to free is the same as len. For example, it is 13 bytes after modification, then it is also 13 bytes for free. The actual length of buf becomes 13 byte+ 13byte + 1byte = 27byte ;
    If the length of len will be greater than or equal to 1 M after modification, then the length allocated to free is 1 M, for example, it is 30M after modification, then it is 1M for free. The actual length of buf becomes 30M + 1M + 1 byte;
    When modifying, first check whether the space is enough, if it is enough, use it directly, otherwise perform memory reallocation.
SDS Inert Space Release:
    The program does not immediately use memory reallocation to reclaim the extra bytes after shortening, but uses the free variable to record the number of these bytes, waiting for future use;        
    SDS also provides a corresponding API, which allows us to truly release the unused space of SDS when needed, so there is no need to worry about memory leaks caused by the lazy space release strategy
4) Binary Safe
    The C string can only save text data, because the C string is considered to end when it encounters '\0'; SDS uses the len variable to determine the end position of the string, so SDS can save text or binary data.
5) SDS can only use some C string functions
    SDS adds '\0' at the end of the string, mainly to allow those SDSs that save text data to reuse part of the C string processing functions without rewriting
4. Expansion part of the source code (specific expansion mechanism):
/* Enlarge the free space at the end of the sds string so that the caller
 * is sure that after calling this function can overwrite up to addlen
 * bytes after the end of the string, plus one more byte for nul term.
 *
 * Note: this does not change the *length* of the sds string as returned
 * by sdslen(), but only the free buffer space we have. */
/*
 * 对 sds 中 buf 的长度进行扩展,确保在函数执行之后,
 * buf 至少会有 addlen + 1 长度的空余空间
 * (额外的 1 字节是为 \0 准备的)
 *
 * 返回值
 *  sds :扩展成功返回扩展后的 sds
 *        扩展失败返回 NULL
 *
 * 复杂度
 *  T = O(N)
 */
sds sdsMakeRoomFor(sds s, size_t addlen) {

    struct sdshdr *sh, *newsh;

    // 获取 s 目前的空余空间长度
    size_t free = sdsavail(s);

    size_t len, newlen;

    // s 目前的空余空间已经足够,无须再进行扩展,直接返回
    if (free >= addlen) return s;

    // 获取 s 目前已占用空间的长度
    len = sdslen(s);
    sh = (void*) (s-(sizeof(struct sdshdr)));

    // s 最少需要的长度
    newlen = (len+addlen);

    // 根据新长度,为 s 分配新空间所需的大小
    if (newlen < SDS_MAX_PREALLOC)
        // 如果新长度小于 SDS_MAX_PREALLOC 
        // 那么为它分配两倍于所需长度的空间
        newlen *= 2;
    else
        // Otherwise, the allocated length is the current length plus SDS_MAX_PREALLOC
        newlen += SDS_MAX_PREALLOC;
    // T = O(N)
    newsh = zrealloc(sh, sizeof(struct sdshdr)+newlen+1);

    // Insufficient memory, allocation failed, return
    if (newsh == NULL) return NULL;

    // Update the free length of sds
    newsh-> free = newlen - len;

    // return sds
    return newsh->buf;
}
Reference book "redis design and implementation"


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325933231&siteId=291194637