[Redis] Some miscellaneous notes about the Redis data structure Simple Dynamic String (SDS)

[Redis] Some miscellaneous notes about the Redis data structure "Simple Dynamic String (SDS)"

Recommend a few more detailed articles about the SDS data structure:

1. Simple dynamic string - Redis design and implementation (redisbook.readthedocs.io)

2. In-depth understanding of simple dynamic strings of Redis - itbsl - Blog Garden (cnblogs.com)

3. Detailed Explanation of Redis Internal Data Structure (2)——sds - Tie Lei's Personal Blog (zhangtielei.com)

4. Simple dynamic string - Redis design and implementation (redisbook.readthedocs.io)

1. The structure and implementation of SDS

In the previous content, we have always described sds as an abstract data structure. In fact, its implementation consists of the following two parts:

typedef char *sds;

struct sdshdr {
    
    

    // buf 已占用长度
    int len;

    // buf 剩余可用长度
    int free;

    // 实际保存字符串数据的地方
    char buf[];
};

Among them, the type sdsis char *an alias (alias), and the sdshdrstructure saves three attributes of len, and .freebuf

As an example, the following is a newly hello worldcreated sdshdrstructure that also holds strings:

struct sdshdr {
    
    
    len = 11;
    free = 0;
    buf = "hello world\0";  // buf 的实际长度为 len + 1
};

Through lenthe attribute , sdshdrthe length calculation operation with a complexity of θ(1) can be realized.

On the other hand, by bufallocating some extra space for and using to keep freetrack of the size of the unused space, sdshdrthe number of memory reallocations required to perform append operations can be greatly reduced, as we'll discuss in detail in the next section.

Of course, sds also puts forward requirements for the correct implementation of operations—all functions sdshdrthat must update lenand freeattributes correctly, otherwise it will cause bugs.

Second, the string object

Redis is a key-value database (key-value DB). The values ​​of the database can be various types of objects such as strings, collections, and lists, while the keys of the database are always string objects. For those String objects that contain String values, each String object contains an sds value.

Notice:

"A string object containing a string value" may sound strange at first, but in Redis, a string object can save not only a string value, but also a value of longtype For the sake of sake, it needs to be emphasized here: When the string object holds a string, it contains the sds value, otherwise, it is a value of longtype .

For example, the following command creates a new key-value pair whose key and value are string objects, and they both contain an sds value:

127.0.0.1:6379> set school "HeFeiUniversity"
OK
127.0.0.1:6379> get school
"HeFeiUniversity"
127.0.0.1:6379>

The following command also creates a key-value pair, but its key is a string object and the value is a collection object:

127.0.0.1:6379> sadd nosql "MongoDB" "Redis" "Neo4j"
(integer) 3
127.0.0.1:6379> smembers nosql
1) "Neo4j"
2) "Redis"
3) "MongoDB"
127.0.0.1:6379>

3. The difference between Redis string and C string

In the C language, strings can \0be charrepresented by a terminated array.

For example, hello worldin C language it can be expressed as "hello world\0".

This simple string representation can meet the requirements in most cases, but it cannot efficiently support the two operations of length calculation and append:

  • strlen(s)The complexity of calculating the string length ( ) each time is θ(N).
  • To append N times to a string, N times of memory reallocation ( ) must be performed on the string realloc.

Inside Redis, string appending and length calculation are very common, and APPEND and STRLEN are these two operations. They are directly mapped in Redis commands. These two simple operations should not become a performance bottleneck.

In addition, in addition to processing C strings, Redis also needs to process simple byte arrays, server protocols, etc., so for convenience, the string representation of Redis should also be binary safe: programs should not save strings Make any assumptions about the data, the data can be a C string \0ending with , or it can be a simple byte array, or data in other formats.

For these two reasons, Redis uses the sds type to replace the C language's default string representation: sds can efficiently implement appending and length calculation, and is binary safe at the same time.

Unlike C strings, because SDS records the length of the SDS itself in the len attribute, the complexity of obtaining the length of an SDS is O(1).

By using SDS instead of C strings, Redis reduces the complexity required to obtain the string length from O(N) to O(1) , which ensures that the work of obtaining the string length will not become a performance bottleneck for Redis. Therefore, even if we repeatedly execute the STRLEN command for a very long string, it will not have any impact on system performance, because the complexity of the STRLEN command is only O(1).

Advantages of SDS over traditional C strings☆☆☆:

C string SDS
The complexity of getting the length of a string is O(N) The complexity of getting the length of a string is O(1)
Manipulating string functions is unsafe and may cause buffer overflow Safe manipulation of string APIs to avoid buffer overflows
Modifying the length of a string N times will necessarily require N times of memory reallocation Modifying the string length N times requires at most N memory reallocations
Can only save text data Binary data such as text and pictures, audio, video, and compressed files can be saved.

4. SDS memory optimization strategy

SDS adopts a space pre-allocation strategy and a lazy space release strategy to avoid memory allocation problems.

The space pre-allocation strategy means that every time SDS expands the space, the program not only allocates the required space, but also allocates additional unused space to reduce the number of memory reallocations. The additional allocated unused space depends on the value of the len attribute of the SDS after space expansion.

  • If the value of the len attribute is less than 1M, then the size of the allocated unused space free is the same as the value of the len attribute.
  • If the value of the len attribute is greater than or equal to 1M, then the size of the allocated unused space free is fixed at 1M.

SDS adopts a lazy space release strategy for space release . This strategy means that if the length of the SDS string is shortened, the extra unused space will not be released temporarily, but will be added to free. In order to reduce the number of memory reallocations when expanding the SDS later. If you want to release the unused space of SDS, you can use sdsRemoveFreeSpace()the function to release it.

5. API of SDS module

The sds module provides the following APIs based on sdstypes and structures:sdshdr

function effect algorithmic complexity
sdsnewlen Create a specified length sds, accepting a C string as initialization value O(N)
sdsempty Create a string ""containingsds O(1)
sdsnew Given a C string, create a correspondingsds O(N)
sdsdup copy givensds O(N)
sdsfree release givensds O(N)
sdsupdatelen update the and of the structure corresponding to the sdsgivensdshdrfreelen O(N)
sdsclear Clears the contents sdsof , initializing it to"" O(1)
sdsMakeRoomFor Extend the structure corresponding sdstosdshdrbuf O(N)
sdsRemoveFreeSpace Release the extra space in without bufchangingbuf O(N)
sdsAllocSize Calculate the total amount of memory used by a sdsgivenbuf O(1)
sdsIncrLen expand or trim the right end of sdsthebuf O(1)
sdsgrowzero sdsExtend the given bufto the specified length, filling the empty part \0with O(N)
sdscatlen sdsexpands by the given length and appends a C string to sdsthe end of O(N)
sdscat Append a C string to sdsthe end of O(N)
sdscatsds sdsAppend one to sdsthe end of another O(N)
sdscpylen copies part of a C string sdsinto , sdsexpanding if necessary O(N)
sdscpy Copy a C string tosds O(N)

This article is for learning reference only!

Guess you like

Origin blog.csdn.net/m0_47015897/article/details/130141038