[Redis] Some miscellaneous notes about the Redis data structure "Simple Dynamic String (SDS)"
Recommend a few more detailed articles about the SDS data structure:
1. Simple dynamic string - Redis design and implementation (redisbook.readthedocs.io)
2. In-depth understanding of simple dynamic strings of Redis - itbsl - Blog Garden (cnblogs.com)
4. Simple dynamic string - Redis design and implementation (redisbook.readthedocs.io)
1. The structure and implementation of SDS
In the previous content, we have always described sds as an abstract data structure. In fact, its implementation consists of the following two parts:
typedef char *sds;
struct sdshdr {
// buf 已占用长度
int len;
// buf 剩余可用长度
int free;
// 实际保存字符串数据的地方
char buf[];
};
Among them, the type sds
is char *
an alias (alias), and the sdshdr
structure saves three attributes of len
, and .free
buf
As an example, the following is a newly hello world
created sdshdr
structure that also holds strings:
struct sdshdr {
len = 11;
free = 0;
buf = "hello world\0"; // buf 的实际长度为 len + 1
};
Through len
the attribute , sdshdr
the length calculation operation with a complexity of θ(1) can be realized.
On the other hand, by buf
allocating some extra space for and using to keep free
track of the size of the unused space, sdshdr
the number of memory reallocations required to perform append operations can be greatly reduced, as we'll discuss in detail in the next section.
Of course, sds also puts forward requirements for the correct implementation of operations—all functions sdshdr
that must update len
and free
attributes correctly, otherwise it will cause bugs.
Second, the string object
Redis is a key-value database (key-value DB). The values of the database can be various types of objects such as strings, collections, and lists, while the keys of the database are always string objects. For those String objects that contain String values, each String object contains an sds value.
Notice:
"A string object containing a string value" may sound strange at first, but in Redis, a string object can save not only a string value, but also a value of
long
type For the sake of sake, it needs to be emphasized here: When the string object holds a string, it contains the sds value, otherwise, it is a value oflong
type .
For example, the following command creates a new key-value pair whose key and value are string objects, and they both contain an sds value:
127.0.0.1:6379> set school "HeFeiUniversity"
OK
127.0.0.1:6379> get school
"HeFeiUniversity"
127.0.0.1:6379>
The following command also creates a key-value pair, but its key is a string object and the value is a collection object:
127.0.0.1:6379> sadd nosql "MongoDB" "Redis" "Neo4j"
(integer) 3
127.0.0.1:6379> smembers nosql
1) "Neo4j"
2) "Redis"
3) "MongoDB"
127.0.0.1:6379>
3. The difference between Redis string and C string
In the C language, strings can \0
be char
represented by a terminated array.
For example, hello world
in C language it can be expressed as "hello world\0"
.
This simple string representation can meet the requirements in most cases, but it cannot efficiently support the two operations of length calculation and append:
strlen(s)
The complexity of calculating the string length ( ) each time is θ(N).- To append N times to a string, N times of memory reallocation ( ) must be performed on the string
realloc
.
Inside Redis, string appending and length calculation are very common, and APPEND and STRLEN are these two operations. They are directly mapped in Redis commands. These two simple operations should not become a performance bottleneck.
In addition, in addition to processing C strings, Redis also needs to process simple byte arrays, server protocols, etc., so for convenience, the string representation of Redis should also be binary safe: programs should not save strings Make any assumptions about the data, the data can be a C string \0
ending with , or it can be a simple byte array, or data in other formats.
For these two reasons, Redis uses the sds type to replace the C language's default string representation: sds can efficiently implement appending and length calculation, and is binary safe at the same time.
Unlike C strings, because SDS records the length of the SDS itself in the len attribute, the complexity of obtaining the length of an SDS is O(1).
By using SDS instead of C strings, Redis reduces the complexity required to obtain the string length from O(N) to O(1) , which ensures that the work of obtaining the string length will not become a performance bottleneck for Redis. Therefore, even if we repeatedly execute the STRLEN command for a very long string, it will not have any impact on system performance, because the complexity of the STRLEN command is only O(1).
Advantages of SDS over traditional C strings☆☆☆:
C string | SDS |
---|---|
The complexity of getting the length of a string is O(N) | The complexity of getting the length of a string is O(1) |
Manipulating string functions is unsafe and may cause buffer overflow | Safe manipulation of string APIs to avoid buffer overflows |
Modifying the length of a string N times will necessarily require N times of memory reallocation | Modifying the string length N times requires at most N memory reallocations |
Can only save text data | Binary data such as text and pictures, audio, video, and compressed files can be saved. |
4. SDS memory optimization strategy
SDS adopts a space pre-allocation strategy and a lazy space release strategy to avoid memory allocation problems.
The space pre-allocation strategy means that every time SDS expands the space, the program not only allocates the required space, but also allocates additional unused space to reduce the number of memory reallocations. The additional allocated unused space depends on the value of the len attribute of the SDS after space expansion.
- If the value of the len attribute is less than 1M, then the size of the allocated unused space free is the same as the value of the len attribute.
- If the value of the len attribute is greater than or equal to 1M, then the size of the allocated unused space free is fixed at 1M.
SDS adopts a lazy space release strategy for space release . This strategy means that if the length of the SDS string is shortened, the extra unused space will not be released temporarily, but will be added to free. In order to reduce the number of memory reallocations when expanding the SDS later. If you want to release the unused space of SDS, you can use sdsRemoveFreeSpace()
the function to release it.
5. API of SDS module
The sds module provides the following APIs based on sds
types and structures:sdshdr
function | effect | algorithmic complexity |
---|---|---|
sdsnewlen |
Create a specified length sds , accepting a C string as initialization value |
O(N) |
sdsempty |
Create a string "" containingsds |
O(1) |
sdsnew |
Given a C string, create a correspondingsds |
O(N) |
sdsdup |
copy givensds |
O(N) |
sdsfree |
release givensds |
O(N) |
sdsupdatelen |
update the and of the structure corresponding to the sds givensdshdr free len |
O(N) |
sdsclear |
Clears the contents sds of , initializing it to"" |
O(1) |
sdsMakeRoomFor |
Extend the structure corresponding sds tosdshdr buf |
O(N) |
sdsRemoveFreeSpace |
Release the extra space in without buf changingbuf |
O(N) |
sdsAllocSize |
Calculate the total amount of memory used by a sds givenbuf |
O(1) |
sdsIncrLen |
expand or trim the right end of sds thebuf |
O(1) |
sdsgrowzero |
sds Extend the given buf to the specified length, filling the empty part \0 with |
O(N) |
sdscatlen |
sds expands by the given length and appends a C string to sds the end of |
O(N) |
sdscat |
Append a C string to sds the end of |
O(N) |
sdscatsds |
sds Append one to sds the end of another |
O(N) |
sdscpylen |
copies part of a C string sds into , sds expanding if necessary |
O(N) |
sdscpy |
Copy a C string tosds |
O(N) |
This article is for learning reference only!