Redis源码剖析之压缩列表

以下涉及到的源码均为redis5.0-rc3版本的代码【点击查看官方源码】

文章目录

压缩列表（ziplist）
数据结构

压缩列表
压缩列表节点
prvlen
encoding

压缩列表存储样例
API

entry的定义
删除操作
插入操作
空间扩展

压缩列表（ziplist）

压缩列表是一个特殊编码的列表，为节约内存而开发。
压缩列表可以保存字符串和整型值，其中整数存放时仍是整数，而不会变成字符。
压缩列表允许push和pop操作在时间复杂度为O(1)的情况下完成。但是每个操作都可能需要重新分配ziplist使用的内存，所以实际的复杂性受ziplist使用内存量及其变动的影响。
压缩列表在redis中是列表键和哈希键的底层实现之一。
由于压缩列表本就为节约内存而开发，且其改动的操作都可能会重新分配entry的内存，导致操作的时间复杂度增加，所以为了性能上不受影响，其适用于存储小整数值和较短的字符串。

数据结构

压缩列表

压缩列表的数据结构是在ziplist.c文件中注释说明中这样描述的：

<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

其中各部分的含义如下：

zlbytes
值类型为uint32_t，占用4个字节。是一个无符号整型值，保存着整个压缩列表占用的字节数（包含自己占用的4个字节）。此值的作用在于当对ziplist的内存进行重新分配的时候直接使用该值，而无需进行遍历整个ziplist；
zltail
值类型为uint32_t，占用4个字节。列表最后一个节点的地址偏移量，这样就可以使得进行pop操作的时候无需遍历整个ziplist；
zllen
值类型为uint16_t，占用2个字节。记录整个ziplist中节点的总数，当其大小大于2^{16-2的时候，zllen=2}16-1，此时，便需要遍历整个ziplist才能确定节点的数量；
entry
ziplist的节点
zlend
值类型为uint8_t，占用1个字节。值为255，表示ziplist结束的位置，当一个字节被设置值为255时即表示后续不会再有节点；

下面用图来举例说明：
这里写图片描述

压缩列表节点

压缩列表节点的组成一般情况下如下所示：

 <prevlen> <encoding> <entry-data>

然而，如果节点存储的是小整型时，节点就不会有这个数据，因为真正的值可以用encoding的值表示，然后节点的组成又会如下所示：

<prevlen> <encoding>

由上可见，每个entry节点都会有前缀信息，且其前缀都由两方面信息组成：prvlen、encoding。

prvlen

代表前一个节点占用的字节长度，方便ziplist从后往前遍历。如p1代表压缩列表地址，p2代表最后一个节点地址，p3代表倒数第二个节点地址，由此可见：

    p2 = p1 + zltail
    p3 = p2 - p2->prvlen
    p(pre) = p3 - p3->prvlen

如果长度小于254字节，prvlen则占用1个字节。

    <prevlen from 0 to 253> <encoding> <entry>

如果长度大于254字节，prvlen则占用5个字节长度，第一个字节代值为254（FE），后面的四个字节用于保存前一个节点的长度。

    0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>

encoding

表示当前节点的编码，可以是integer或者string。
当其节点存储的是一个string时，encoding的第一个字节的前2个bit（00、01、10）代表了ecoding的类型，其他的bit则表示string的长度。
当encoding的第一个字节的前2个bit为11时，代表entry节点存储的是整数，且后续的2个bit则可以用来表示integer的值（即节点存储小整数的时候）

下面直接贴身源码中的例子，来举例当类型不同时的encoding的样子。

//---------------字符串类型---------------//
 * |00pppppp| - 1 byte
 *      String value with length less than or equal to 63 bytes (6 bits).
 *      "pppppp" represents the unsigned 6 bit length.
 * |01pppppp|qqqqqqqq| - 2 bytes
 *      String value with length less than or equal to 16383 bytes (14 bits).
 *      IMPORTANT: The 14 bit number is stored in big endian.
 * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
 *      String value with length greater than or equal to 16384 bytes.
 *      Only the 4 bytes following the first byte represents the length
 *      up to 32^2-1. The 6 lower bits of the first byte are not used and
 *      are set to zero.
 *      IMPORTANT: The 32 bit number is stored in big endian.
//-----------------整数类型------------//
 * |11000000| - 3 bytes
 *      Integer encoded as int16_t (2 bytes).
 * |11010000| - 5 bytes
 *      Integer encoded as int32_t (4 bytes).
 * |11100000| - 9 bytes
 *      Integer encoded as int64_t (8 bytes).
 * |11110000| - 4 bytes
 *      Integer encoded as 24 bit signed (3 bytes).
 * |11111110| - 2 bytes
 *      Integer encoded as 8 bit signed (1 byte).
 * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
 *      Unsigned integer from 0 to 12. The encoded value is actually from
 *      1 to 13 because 0000 and 1111 can not be used, so 1 should be
 *      subtracted from the encoded 4 bit value to obtain the right value.
 * |11111111| - End of ziplist special entry.

压缩列表存储样例

这里用源码中的两个例子来具体表示压缩列表的存储情况，如下所示：

//-----------------整数类型------------//
 * The following is a ziplist containing the two elements representing
 * the strings "2" and "5". It is composed of 15 bytes, that we visually
 * split into sections:
 *
 *  [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
 *        |             |          |       |       |     |
 *     zlbytes        zltail    entries   "2"     "5"   end
 *
 * The first 4 bytes represent the number 15, that is the number of bytes
 * the whole ziplist is composed of. The second 4 bytes are the offset
 * at which the last ziplist entry is found, that is 12, in fact the
 * last entry, that is "5", is at offset 12 inside the ziplist.
 * The next 16 bit integer represents the number of elements inside the
 * ziplist, its value is 2 since there are just two elements inside.
 * Finally "00 f3" is the first entry representing the number 2. It is
 * composed of the previous entry length, which is zero because this is
 * our first entry, and the byte F3 which corresponds to the encoding
 * |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"
 * higher order bits 1111, and subtract 1 from the "3", so the entry value
 * is "2". The next entry has a prevlen of 02, since the first entry is
 * composed of exactly two bytes. The entry itself, F6, is encoded exactly
 * like the first entry, and 6-1 = 5, so the value of the entry is 5.
 * Finally the special entry FF signals the end of the ziplist.
 *
//-----------------字符串类型------------//
 * Adding another element to the above string with the value "Hello World"
 * allows us to show how the ziplist encodes small strings. We'll just show
 * the hex dump of the entry itself. Imagine the bytes as following the
 * entry that stores "5" in the ziplist above:
 *
 * [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]
 *
 * The first byte, 02, is the length of the previous entry. The next
 * byte represents the encoding in the pattern |00pppppp| that means
 * that the entry is a string of length <pppppp>, so 0B means that
 * an 11 bytes string follows. From the third byte (48) to the last (64)
 * there are just the ASCII characters for "Hello World".

API

在ziplist.h头文件中给出了如下api：

//创建压缩列表
unsigned char *ziplistNew(void);
//合并两个压缩列表，将second追加在first后面
unsigned char *ziplistMerge(unsigned char **first, unsigned char **second);
//新加节点
unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where);
//返回给定索引的节点
unsigned char *ziplistIndex(unsigned char *zl, int index);
//下一个节点
unsigned char *ziplistNext(unsigned char *zl, unsigned char *p);
//前一个节点
unsigned char *ziplistPrev(unsigned char *zl, unsigned char *p);
//获取节点中的值
unsigned int ziplistGet(unsigned char *p, unsigned char **sval, unsigned int *slen, long long *lval);
//插入节点
unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen);
//删除节点
unsigned char *ziplistDelete(unsigned char *zl, unsigned char **p);
//删除多个节点
unsigned char *ziplistDeleteRange(unsigned char *zl, int index, unsigned int num);
//比较节点值大小
unsigned int ziplistCompare(unsigned char *p, unsigned char *s, unsigned int slen);
//查找
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip);
//获取节点个数
unsigned int ziplistLen(unsigned char *zl);
//压缩列表暂用总字节数
size_t ziplistBlobLen(unsigned char *zl);
//格式化打印压缩列表信息
void ziplistRepr(unsigned char *zl);

下面将只对节点的定义，以及节点的插入和删除操作的源码做介绍，其余的若想要了解可去浏览源码。而为什么要单独只介绍删除和插入两个操作呢？因为其涉及到压缩列表的一个影响性能的重要操作——空间扩展

entry的定义

typedef struct zlentry {
    //prevrawlensize占用的字节情况
    unsigned int prevrawlensize;
    //前一个节点的长度
    unsigned int prevrawlen; 
    //节点的长度（string有1、2、5字节情况，integer只有单字节的情况）
    unsigned int lensize; 
    //节点占用字节数
    unsigned int len; 
    //prevrawlensize + lensize
    unsigned int headersize;
    //编码类型：ZIP_STR_* or ZIP_INT_*
    unsigned char encoding;      
    //指向节点起始的指针
    unsigned char *p;
} zlentry;

删除操作

//删除元素为num的节点
unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
    unsigned int i, totlen, deleted = 0;
    size_t offset;
    int nextdiff = 0;
    zlentry first, tail;

    zipEntry(p, &first);
    for (i = 0; p[0] != ZIP_END && i < num; i++) {
        p += zipRawEntryLength(p);
        deleted++;
    }

    //计算需要删除的空间
    totlen = p-first.p; 
    if (totlen > 0) {
        if (p[0] != ZIP_END) {
            nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);

            p -= nextdiff;
            zipStorePrevEntryLength(p,first.prevrawlen);

            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);

            zipEntry(p, &tail);
            if (p[tail.headersize+tail.len] != ZIP_END) {
                ZIPLIST_TAIL_OFFSET(zl) =
                   intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
            }
            //移动后面的节点
            memmove(first.p,p,
                intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);
        } else {
            //尾节点删除，不需要进行空间扩展
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe((first.p-zl)-first.prevrawlen);
        }

        //重新计算尾节点的偏移量和计算节点总字节数
        offset = first.p-zl;
        zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
        ZIPLIST_INCR_LENGTH(zl,-deleted);
        p = zl+offset;

        if (nextdiff != 0)
            //进行空间扩展更新
            zl = __ziplistCascadeUpdate(zl,p);
    }
    return zl;
}

插入操作

//插入节点p
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
    unsigned int prevlensize, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789;
    zlentry tail;

    //查找要插入节点的前一个节点
    if (p[0] != ZIP_END) {
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else {
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        if (ptail[0] != ZIP_END) {
            prevlen = zipRawEntryLength(ptail);
        }
    }

    //判断entry是否要encoding
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        //设置int类型
        reqlen = zipIntSize(encoding);
    } else {
        reqlen = slen;
    }
    
    reqlen += zipStorePrevEntryLength(NULL,prevlen);
    reqlen += zipStoreEntryEncoding(NULL,encoding,slen);

    //当插入的节点不是在最后的时候，我们需要确保下一个节点有足够的内存来保存当前节点的长度
    int forcelarge = 0;
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
    if (nextdiff == -4 && reqlen < 4) {
        nextdiff = 0;
        forcelarge = 1;
    }

    /* Store offset because a realloc may change the address of zl. */
    offset = p-zl;
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;

    //当不是插入到最后的时候
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        /* Encode this entry's raw length in the next entry. */
        if (forcelarge)
            zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
        else
            zipStorePrevEntryLength(p+reqlen,reqlen);

        /* Update offset for tail */
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        //当要插入位置后面有多个节点时
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        //当是插入到最后的时候
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    //判断是否进行空间扩展
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    //写入节点
    p += zipStorePrevEntryLength(p,prevlen);
    p += zipStoreEntryEncoding(p,encoding,slen);
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

空间扩展

/* When an entry is inserted, we need to set the prevlen field of the next
 * entry to equal the length of the inserted entry. It can occur that this
 * length cannot be encoded in 1 byte and the next entry needs to be grow
 * a bit larger to hold the 5-byte encoded prevlen. This can be done for free,
 * because this only happens when an entry is already being inserted (which
 * causes a realloc and memmove). However, encoding the prevlen may require
 * that this entry is grown as well. This effect may cascade throughout
 * the ziplist when there are consecutive entries with a size close to
 * ZIP_BIG_PREVLEN, so we need to check that the prevlen can be encoded in
 * every consecutive entry.
 *
 * Note that this effect can also happen in reverse, where the bytes required
 * to encode the prevlen field can shrink. This effect is deliberately ignored,
 * because it can cause a "flapping" effect where a chain prevlen fields is
 * first grown and then shrunk again after consecutive inserts. Rather, the
 * field is allowed to stay larger than necessary, because a large prevlen
 * field implies the ziplist is holding large entries anyway.
 *
 * The pointer "p" points to the first entry that does NOT need to be
 * updated, i.e. consecutive fields MAY need an update. */
 
unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize;
    size_t offset, noffset, extra;
    unsigned char *np;
    zlentry cur, next;

    while (p[0] != ZIP_END) {
        zipEntry(p, &cur);
        rawlen = cur.headersize + cur.len;
        rawlensize = zipStorePrevEntryLength(NULL,rawlen);

        //当以及时最后一个节点的时候，结束操作
        if (p[rawlen] == ZIP_END) break;
        zipEntry(p+rawlen, &next);

        //当prvlen内存不需要改变的时候，结束操作
        if (next.prevrawlen == rawlen) break;

        if (next.prevrawlensize < rawlensize) {
            offset = p-zl;
            extra = rawlensize-next.prevrawlensize;
            zl = ziplistResize(zl,curlen+extra);
            p = zl+offset;

            np = p+rawlen;
            noffset = np-zl;

            if ((zl+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np) {
                ZIPLIST_TAIL_OFFSET(zl) =
                    intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+extra);
            }

            memmove(np+rawlensize,
                np+next.prevrawlensize,
                curlen-noffset-next.prevrawlensize-1);
            zipStorePrevEntryLength(np,rawlen);

            p += rawlen;
            curlen += extra;
        } else {
            if (next.prevrawlensize > rawlensize) {
               zipStorePrevEntryLengthLarge(p+rawlen,rawlen);
            } else {
                zipStorePrevEntryLength(p+rawlen,rawlen);
            }
            break;
        }
    }
    return zl;
}