Analysis of the internal coding of the five data types in Redis (1, String)

image



Overview


We usually use Redis at the user level. We may operate a key-value pair without thinking about it to access data conveniently. It feels very convenient. But do you know how these data are stored and encoded behind the scenes? A clear understanding of this problem will have guiding significance for us to use Redis more efficiently. At the beginning of this article, we will combine Redis source code to discuss the internal coding mechanism of Redis's five data types one by one.

  • Experimental environment: Redis 4.0.10

Note:  This article was first published on the My public account CodeSheep , you can long press or scan the caution below to subscribe ↓ ↓ ↓        

image




Overview of internal coding of Redis data types


For the five commonly used  data types of Redis  (String, Hash, List, Set, sorted set), each data type provides at  least two  internal encoding formats, and the choice of internal encoding for each data type  is completely for users Transparent , Redis will adaptively select a more optimized internal encoding format according to the amount of data.

If you want to view the internal encoding format of a key, you can use  OBJECT ENCODING keynamecommands to do so, such as:

127.0.0.1:6379>  
127.0.0.1:6379> set foo bar 
OK 
127.0.0.1:6379>  
127.0.0.1:6379> object encoding foo // View the encoding of a Redis key value 
"embstr" 
127.0.0.1:6379>  
127.0.0.1:6379>

Redis Each key value of is internally  saved with a name called  redisObjectthis C language structure, and the code is as follows:

image

The explanation is as follows:

  • type: Represents the data type of the key value, including String, List, Set, ZSet, Hash

  • encoding: Represents the internal encoding method of the key value. From the Redis source code, the current values ​​are as follows:

#define OBJ_ENCODING_RAW 0        /* Raw representation */
#define OBJ_ENCODING_INT 1        /* Encoded as integer */
#define OBJ_ENCODING_HT 2         /* Encoded as hash table */
#define OBJ_ENCODING_ZIPMAP 3     /* Encoded as zipmap */
#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */
#define OBJ_ENCODING_ZIPLIST 5    /* Encoded as ziplist */
#define OBJ_ENCODING_INTSET 6     /* Encoded as intset */
#define OBJ_ENCODING_SKIPLIST 7   /* Encoded as skiplist */
#define OBJ_ENCODING_EMBSTR 8     /* Embedded sds string encoding */
#define OBJ_ENCODING_QUICKLIST 9  /* Encoded as linked list of ziplists */
  • refcount: Indicates the number of references to the key value, that is, a key value can be referenced by multiple keys

In this article, we will start with the internal coding of the most basic String type in Redis!




The internal encoding of the String type


字符串是 Redis最基本的数据类型,Redis 中字符串对象的编码可以是 int, raw 或者 embstr 中的某一种,分别介绍如下:

  • int 编码:保存long 型的64位有符号整数

  • embstr 编码:保存长度小于44字节的字符串

  • raw 编码:保存长度大于44字节的字符串

我们不妨来做个实验实际看一下:

image

实际情况就是 Redis 内部会根据用户给的不同键值而使用不同的编码格式,而这一切对用户完全透明!

Redis 是使用 SDS(“简单动态字符串”)这个结构体来存储字符串,代码里定义了 5种 SDS结构体:

struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

可以看出,除了结构体字段数据类型的不同,其字段含义相差无几,其中:

  • len:字符串的长度(实际使用的长度)

  • alloc:分配内存的大小

  • flags:标志位,低三位表示类型,其余五位未使用

  • buf:字符数组

了解了这些基本的数据结构以后,我们就来看看上面例子中:

  • set foo 123

  • set foo abc

  • set foo abcdefghijklmnopqrstuvwxyzabcdeffasdffsdaadsx

这三种情形下 Redis 内部到底是怎么存数据的!




INT 编码格式


命令示例: setfoo123

当字符串键值的内容可以用一个 64位有符号整形 来表示时,Redis会将键值转化为 long型来进行存储,此时即对应 OBJ_ENCODING_INT 编码类型。

OBJ_ENCODING_INT 编码类型内部的内存结构可以形象地表示如下:

image

而且 Redis 启动时会预先建立 10000 个分别存储 0~9999 的 redisObject 变量作为共享对象,这就意味着如果 set字符串的键值在 0~10000 之间的话,则可以 直接指向共享对象 而不需要再建立新对象,此时键值不占空间!

因此,当执行如下指令时:

set key1 100set key2 100

其实 key1 和 key2 这两个键值都直接引用了一个 Redis 预先已建立好的共享 redisObject 对象,就像下面这样:

image

源码之前,了无秘密,我们再对照下面的源码,来理解一下上述过程

image.png




EMBSTR 编码格式


命令示例: setfoo abc

Redis 在保存长度小于 44 字节的字符串时会采用 OBJ_ENCODING_EMBSTR编码方式,口说无凭,我们来瞅瞅源码:

image.png

从上述代码中很容易看出,对于长度小于 44的字符串,Redis 对键值采用 OBJ_ENCODING_EMBSTR 方式,EMBSTR 顾名思义即:embedded string,表示嵌入式的String。从内存结构上来讲 即字符串 sds结构体与其对应的 redisObject 对象分配在 同一块连续的内存空间,这就仿佛字符串 sds 嵌入在 redisObject 对象之中一样,这一切从下面的代码即可清楚地看到:

image

因此,对于指令 setfoo abc 所设置的键值,其内存结构示意图如下:

image




RAW 编码格式


指令示例: setfoo abcdefghijklmnopqrstuvwxyzabcdeffasdffsdaadsx

Just like the instruction example, when the key value of the string  is  a  very long string with a length greater than  44 , Redis will change the internal encoding of the key to the  OBJ_ENCODING_RAW format.  The difference from the above  OBJ_ENCODING_EMBSTRencoding is that the dynamic character is at this time The memory of the string sds and the memory of the redisObject it depends on are  no longer continuous  . Taking the above command as an example, the memory structure of its key value is as follows:

image

This concludes the internal coding of the most basic String data type, how about it, it is still quite easy to understand!

Later, we will continue to analyze the internal encoding format of the Hash data type in Redis.




Postscript


Due to limited ability, if there are mistakes or improprieties, please criticize and correct them, and learn and communicate together!

  • My personal blog: www.codesheep.cn


If you are interested, you can also take time to read the author's article on containerization and microservices:



On the more pragmatic, able to read, reproducible original article to make public number CodeSheep , subscribe ⬇️⬇️⬇️

image


Guess you like

Origin blog.51cto.com/15127562/2663987