Overview
We usually use Redis at the user level. We may operate a key-value pair without thinking about it to access data conveniently. It feels very convenient. But do you know how these data are stored and encoded behind the scenes? A clear understanding of this problem will have guiding significance for us to use Redis more efficiently. At the beginning of this article, we will combine Redis source code to discuss the internal coding mechanism of Redis's five data types one by one.
Experimental environment: Redis 4.0.10
Note: This article was first published on the My public account CodeSheep , you can long press or scan the caution below to subscribe ↓ ↓ ↓
Overview of internal coding of Redis data types
For the five commonly used data types of Redis (String, Hash, List, Set, sorted set), each data type provides at least two internal encoding formats, and the choice of internal encoding for each data type is completely for users Transparent , Redis will adaptively select a more optimized internal encoding format according to the amount of data.
If you want to view the internal encoding format of a key, you can use OBJECT ENCODING keyname
commands to do so, such as:
127.0.0.1:6379> 127.0.0.1:6379> set foo bar OK 127.0.0.1:6379> 127.0.0.1:6379> object encoding foo // View the encoding of a Redis key value "embstr" 127.0.0.1:6379> 127.0.0.1:6379>
Redis
Each key value of is internally saved with a name called redisObject
this C language structure, and the code is as follows:
The explanation is as follows:
type
: Represents the data type of the key value, including String, List, Set, ZSet, Hashencoding
: Represents the internal encoding method of the key value. From the Redis source code, the current values are as follows:
#define OBJ_ENCODING_RAW 0 /* Raw representation */ #define OBJ_ENCODING_INT 1 /* Encoded as integer */ #define OBJ_ENCODING_HT 2 /* Encoded as hash table */ #define OBJ_ENCODING_ZIPMAP 3 /* Encoded as zipmap */ #define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */ #define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */ #define OBJ_ENCODING_INTSET 6 /* Encoded as intset */ #define OBJ_ENCODING_SKIPLIST 7 /* Encoded as skiplist */ #define OBJ_ENCODING_EMBSTR 8 /* Embedded sds string encoding */ #define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */
refcount
: Indicates the number of references to the key value, that is, a key value can be referenced by multiple keys
In this article, we will start with the internal coding of the most basic String type in Redis!
The internal encoding of the String type
字符串是 Redis最基本的数据类型,Redis 中字符串对象的编码可以是 int
, raw
或者 embstr
中的某一种,分别介绍如下:
int 编码:保存long 型的64位有符号整数
embstr 编码:保存长度小于44字节的字符串
raw 编码:保存长度大于44字节的字符串
我们不妨来做个实验实际看一下:
实际情况就是 Redis 内部会根据用户给的不同键值而使用不同的编码格式,而这一切对用户完全透明!
Redis 是使用 SDS(“简单动态字符串”)这个结构体来存储字符串,代码里定义了 5种 SDS结构体:
struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; /* 3 lsb of type, and 5 msb of string length */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr8 { uint8_t len; /* used */ uint8_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr16 { uint16_t len; /* used */ uint16_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr32 { uint32_t len; /* used */ uint32_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr64 { uint64_t len; /* used */ uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; };
可以看出,除了结构体字段数据类型的不同,其字段含义相差无几,其中:
len
:字符串的长度(实际使用的长度)alloc
:分配内存的大小flags
:标志位,低三位表示类型,其余五位未使用buf
:字符数组
了解了这些基本的数据结构以后,我们就来看看上面例子中:
set foo 123
set foo abc
set foo abcdefghijklmnopqrstuvwxyzabcdeffasdffsdaadsx
这三种情形下 Redis 内部到底是怎么存数据的!
INT 编码格式
命令示例: setfoo123
当字符串键值的内容可以用一个 64位有符号整形 来表示时,Redis会将键值转化为 long型来进行存储,此时即对应 OBJ_ENCODING_INT
编码类型。
OBJ_ENCODING_INT
编码类型内部的内存结构可以形象地表示如下:
而且 Redis 启动时会预先建立 10000 个分别存储 0~9999 的 redisObject 变量作为共享对象,这就意味着如果 set字符串的键值在 0~10000 之间的话,则可以 直接指向共享对象 而不需要再建立新对象,此时键值不占空间!
因此,当执行如下指令时:
set key1 100set key2 100
其实 key1 和 key2 这两个键值都直接引用了一个 Redis 预先已建立好的共享 redisObject 对象,就像下面这样:
源码之前,了无秘密,我们再对照下面的源码,来理解一下上述过程
EMBSTR 编码格式
命令示例: setfoo abc
Redis 在保存长度小于 44 字节的字符串时会采用 OBJ_ENCODING_EMBSTR
编码方式,口说无凭,我们来瞅瞅源码:
从上述代码中很容易看出,对于长度小于 44的字符串,Redis 对键值采用 OBJ_ENCODING_EMBSTR
方式,EMBSTR 顾名思义即:embedded string,表示嵌入式的String。从内存结构上来讲 即字符串 sds结构体与其对应的 redisObject 对象分配在 同一块连续的内存空间,这就仿佛字符串 sds 嵌入在 redisObject 对象之中一样,这一切从下面的代码即可清楚地看到:
因此,对于指令 setfoo abc
所设置的键值,其内存结构示意图如下:
RAW 编码格式
指令示例: setfoo abcdefghijklmnopqrstuvwxyzabcdeffasdffsdaadsx
Just like the instruction example, when the key value of the string is a very long string with a length greater than 44 , Redis will change the internal encoding of the key to the OBJ_ENCODING_RAW
format. The difference from the above OBJ_ENCODING_EMBSTR
encoding is that the dynamic character is at this time The memory of the string sds and the memory of the redisObject it depends on are no longer continuous . Taking the above command as an example, the memory structure of its key value is as follows:
This concludes the internal coding of the most basic String data type, how about it, it is still quite easy to understand!
Later, we will continue to analyze the internal encoding format of the Hash data type in Redis.
Postscript
Due to limited ability, if there are mistakes or improprieties, please criticize and correct them, and learn and communicate together!
My personal blog: www.codesheep.cn
If you are interested, you can also take time to read the author's article on containerization and microservices:
Use the K8S technology stack to create a personal private cloud series of articles
Use TICK to build a Docker container visual monitoring center
Explain Nginx server configuration from a configuration list
Docker container visualization monitoring center construction
Use ELK to build Docker containerized application log center
On the more pragmatic, able to read, reproducible original article to make public number CodeSheep , subscribe ⬇️⬇️⬇️