redis中embstr与raw编码方式之间的界限

在阅读《Redis设计与实现》8.2字符串对象中，当字符串长度小于32字节，字符串对象将使用emstr编码，大于32字节，字符串使用raw。

验证：当小于44个字节的时候使用embstr，大于44的时候位raw

源码创建stringObject的逻辑

在redis源码中3.0、3.2以及4.0中，代码创建的逻辑是与REDIS_ENCODING_EMBSTR_SIZE_LIMIT 39进行比较，如果小于39的话创建的是embstr，否则位raw

#define REDIS_ENCODING_EMBSTR_SIZE_LIMIT 39
robj *createStringObject(char *ptr, size_t len) {
    if (len <= REDIS_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

//创建embstr
robj *createEmbeddedStringObject(char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr)+len+1);
    struct sdshdr *sh = (void*)(o+1);

    o->type = REDIS_STRING;
    o->encoding = REDIS_ENCODING_EMBSTR;
    o->ptr = sh+1;
    o->refcount = 1;
    o->lru = LRU_CLOCK();

    sh->len = len;
    sh->free = 0;
    if (ptr) {
        memcpy(sh->buf,ptr,len);
        sh->buf[len] = '\0';
    } else {
        memset(sh->buf,0,len+1);
    }
    return o;
}
//创建raw
robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = REDIS_ENCODING_RAW;
    o->ptr = ptr;
    o->refcount = 1;

    /* Set the LRU to the current lruclock (minutes resolution). */
    o->lru = LRU_CLOCK();
    return o;
}

redis使用jemalloc内存分配器。这个比glibc的malloc要好不少，还省内存。在这里可以简单理解，jemalloc会分配8，16，32，64等字节的内存。所以embstr最小分配64字节。其中16个字节值得是redisObject所占的字节数。

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:REDIS_LRU_BITS; /* lru time (relative to server.lruclock) */
    int refcount;
    void *ptr;
} robj;

其中sdshr中len与free这两个变量所占用8个字节，/0占用一个字节，buff最多占用，64-8-16-1=39剩下的39个字节，这个默认39就是这样来的。

struct sdshdr {
    unsigned int len;
    unsigned int free;
    char buf[];
};

那么图中44位的设置又是怎么一回事呢？对比分支3.0与5.0、6.0发现设置的这个值有发生了一些变化。

在git的redis迭代过程中commit，进行了一系列的内存优化，原因是sdshdr，里面的len和free记录了这个sds的长度和空闲空间，但是这样的处理十分粗糙，使用的unsigned int可以表示很大的范围，但是对于很短的sds有很多的空间被浪费了(两个unsigned int 8个字节)。而这个commit则将原来的sdshdr改成了sdshdr16，sdshdr32，sdshdr64，里面的unsigned int 变成了uint8_t,uint16_t.。。。（还加了一个char flags）这样更加优化小sds的内存使用。其中将原来的8个字节变为，uint8 len、alloc、以及char flags，总计3个字节，由原来的8字节缩减为3字节，剩余的5字节+39，所以总共是44个字节。

struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

embstr的优势

1、embstr和raw都使用redisObject结构和sdshdr结构来表示字符串对象，但是raw会分别两次创建redisObject结构与sdshdr结构，内存不一定是连续的，而embstr直接创建一块连续的内存

2、embstr开辟连续的内存可以带来的优势：

内存释放是embstr只需要释放一次，而raw需要释放两次
emstr查找的更快

为什么redis小等于39字节的字符串是embstr编码，大于39是raw编码？

Redis的embstr与raw编码方式不再以39字节为界了！

《Redis设计与实现》