Redis 6.0 source code reading notes (4)-String data type source code analysis

1. Storage structure

In the introduction of the redis string object String , we know that redis has three storage forms for the storage of strings. The memory structure of its storage is shown in the following picture example:

  • OBJ_ENCODING_INT: The length of the saved string is less than 20, and it can be parsed as an integer value of type long, then the storage method isDirectly point the ptr pointer of redisObject to this integer value
    Insert picture description here

  • OBJ_ENCODING_EMBSTR: Strings whose length is less than 44 (OBJ_ENCODING_EMBSTR_SIZE_LIMIT) will be stored in redisObject in the form of simple dynamic strings (SDS), butThe redisObject object header will continue to exist with the SDS object
    Insert picture description here

  • OBJ_ENCODING_RAW: The string is stored in the form of a simple dynamic string (SDS),The redisObject object header and the SDS object are generally two discontinuous pieces of memory in the memory address
    Insert picture description here

2. Data storage source code analysis

2.1 Data storage process

  1. In (1) -Redis server side, start Redis 6.0 source code to read notes and command execution , we already know the client to save a string of setcommands will be called to t_string.c#setCommand()function, its source code to achieve the following:

    The following two key functions are called in this method. This section mainly focuses on the tryObjectEncoding() function

    1. tryObjectEncoding() Try to encode the string object that needs to be saved transmitted from the client to save memory
    2. setGenericCommand() save the key-value to the database
    void setCommand(client *c) {
          
          
    
     ......
    
     c->argv[2] = tryObjectEncoding(c->argv[2]);
     setGenericCommand(c,flags,c->argv[1],c->argv[2],expire,unit,NULL,NULL);
    }
    
  2. object.c#tryObjectEncoding() The function logic is very clear. You can see that the following operations are mainly performed:

    1. When the string length is less than 20 and may be resolved as long type of data, the data will be stored in integer form, and in robj->ptr = (void*) valuethis form direct assignment storage
    2. When the string length is less than or equal to the OBJ_ENCODING_EMBSTR_SIZE_LIMIT configuration and it is still raw encoding, call the createEmbeddedStringObject() function to convert it to embstr encoding
    3. This string object can no longer be transcoded, so I have to call the trimStringObjectIfNeeded() function to try to remove all the free space from the string object
    robj *tryObjectEncoding(robj *o) {
          
          
     long value;
     sds s = o->ptr;
     size_t len;
    
     /* Make sure this is a string object, the only type we encode
      * in this function. Other types use encoded memory efficient
      * representations but are handled by the commands implementing
      * the type. */
     serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);
    
     /* We try some specialized encoding only for objects that are
      * RAW or EMBSTR encoded, in other words objects that are still
      * in represented by an actually array of chars. */
     if (!sdsEncodedObject(o)) return o;
    
     /* It's not safe to encode shared objects: shared objects can be shared
      * everywhere in the "object space" of Redis and may end in places where
      * they are not handled. We handle them only as values in the keyspace. */
      if (o->refcount > 1) return o;
    
     /* Check if we can represent this string as a long integer.
      * Note that we are sure that a string larger than 20 chars is not
      * representable as a 32 nor 64 bit integer. */
     len = sdslen(s);
     if (len <= 20 && string2l(s,len,&value)) {
          
          
         /* This object is encodable as a long. Try to use a shared object.
          * Note that we avoid using shared integers when maxmemory is used
          * because every object needs to have a private LRU field for the LRU
          * algorithm to work well. */
         if ((server.maxmemory == 0 ||
             !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
             value >= 0 &&
             value < OBJ_SHARED_INTEGERS)
         {
          
          
             decrRefCount(o);
             incrRefCount(shared.integers[value]);
             return shared.integers[value];
         } else {
          
          
             if (o->encoding == OBJ_ENCODING_RAW) {
          
          
                 sdsfree(o->ptr);
                 o->encoding = OBJ_ENCODING_INT;
                 o->ptr = (void*) value;
                 return o;
             } else if (o->encoding == OBJ_ENCODING_EMBSTR) {
          
          
                 decrRefCount(o);
                 return createStringObjectFromLongLongForValue(value);
             }
         }
     }
    
     /* If the string is small and is still RAW encoded,
      * try the EMBSTR encoding which is more efficient.
      * In this representation the object and the SDS string are allocated
      * in the same chunk of memory to save space and cache misses. */
     if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
          
          
         robj *emb;
    
         if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
         emb = createEmbeddedStringObject(s,sdslen(s));
         decrRefCount(o);
         return emb;
     }
    
     /* We can't encode the object...
      *
      * Do the last try, and at least optimize the SDS string inside
      * the string object to require little space, in case there
      * is more than 10% of free space at the end of the SDS string.
      *
      * We do that only for relatively large strings as this branch
      * is only entered if the length of the string is greater than
      * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */
     trimStringObjectIfNeeded(o);
    
     /* Return the original object. */
     return o;
    }
    
  3. object.c#createEmbeddedStringObject() The function to implement embstr encoding is also very simple, the main steps are as follows:

    1. First call the zmalloc() function to apply for memory. You can see that here not only the memory of the string to be stored and the memory of redisObject, but also the memory of sdshdr8, one of the SDS implementation structures, is applied.This is why the embstr encoding mentioned above only applies for memory once, and the redisObject object header will continue to exist with the SDS object.
    2. Point the ptr pointer of the redisObject object to the memory address starting from sdshdr8
    3. Filling each attribute sdshdr8 objects, including lenthe character string length, alloccharacter array capacity is actually stored string bufcharacter array
    robj *createEmbeddedStringObject(const char *ptr, size_t len) {
          
          
     robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
     struct sdshdr8 *sh = (void*)(o+1);
    
     o->type = OBJ_STRING;
     o->encoding = OBJ_ENCODING_EMBSTR;
     o->ptr = sh+1;
     o->refcount = 1;
     if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
          
          
         o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
     } else {
          
          
         o->lru = LRU_CLOCK();
     }
    
     sh->len = len;
     sh->alloc = len;
     sh->flags = SDS_TYPE_8;
     if (ptr == SDS_NOINIT)
         sh->buf[len] = '\0';
     else if (ptr) {
          
          
         memcpy(sh->buf,ptr,len);
         sh->buf[len] = '\0';
     } else {
          
          
         memset(sh->buf,0,len+1);
     }
     return o;
    }
    
  4. Create raw coded string can refer to object.c#createRawStringObject()a function, which relates to two memory application, sds.c#sdsnewlen()the application memory to create an object SDS, object.c#createObject()for memory objects created redisObject

    robj *createRawStringObject(const char *ptr, size_t len) {
          
          
     return createObject(OBJ_STRING, sdsnewlen(ptr,len));
    }
    
  5. From the function of detecting the size of the capacity t_string.c#checkStringLength(),The maximum length of the string is 512M, An error will be reported if this value is exceeded

    static int checkStringLength(client *c, long long size) {
          
          
     if (size > 512*1024*1024) {
          
          
         addReplyError(c,"string exceeds maximum allowed size (512MB)");
         return C_ERR;
     }
     return C_OK;
    }
    

2.2 Simple dynamic string SDS

2.2.1 SDS structure

SDS(简单动态字符串) In Redis, it is a tool for string storage. Essentially it is still a character array, but it is not like C language strings to identify the end of the string with'\0'

The traditional C string conforms to ASCII encoding. The characteristic of this encoding operation is: stop at zero. That is, when reading a string, as long as it encounters'\0', it is considered to have reached the end, and all characters after'\0' are ignored. In addition, the method of obtaining the length of the string is to traverse the string and stop at zero. The time complexity is O(n), which is relatively inefficient

The implementation structure is defined in the SDS sds.h, which is defined as follows. Because SDS judges whether it reaches the end of the string based on the len attribute of the header, it can efficiently calculate the length of the string and quickly append data

There are 5 Header definitions in the sds structure,The purpose is to provide headers of different sizes for strings of different lengths to save memory. Take sdshdr8 as an example, its len attribute is uint8_t type, and the memory size is 1 byte, so the maximum length of the stored string is 256. Header mainly contains the following attributes:

  1. len: The real length of the string, excluding the null terminator
  2. alloc: The length of the buf array excluding the header and terminator, which is the maximum capacity
  3. flags: Type of logo header
  4. buf: Character array, actually stores characters
/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    
    
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    
    
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    
    
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    
    
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    
    
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

2.2.2 SDS capacity adjustment
  1. The function of SDS expansion is sds.c#sdsMakeRoomFor(),When the string length is less than 1M, the expansion will double the existing space. If it exceeds 1M, the expansion will only expand 1M more space at a time.. The following is the source code implementation:

    Before the length of the string is less than SDS_MAX_PREALLOC (1024*1024, which is 1MB, defined in sds.h), the capacity is expanded by 2 times, that is, 100% redundant space is reserved. When the length exceeds SDS_MAX_PREALLOC, each expansion will only allocate more redundant space of the size of SDS_MAX_PREALLOC to avoid excessive redundant space after doubling the expansion and causing waste

    sds sdsMakeRoomFor(sds s, size_t addlen) {
          
          
     void *sh, *newsh;
     size_t avail = sdsavail(s);
     size_t len, newlen;
     char type, oldtype = s[-1] & SDS_TYPE_MASK;
     int hdrlen;
    
     /* Return ASAP if there is enough space left. */
     if (avail >= addlen) return s;
    
     len = sdslen(s);
     sh = (char*)s-sdsHdrSize(oldtype);
     newlen = (len+addlen);
     if (newlen < SDS_MAX_PREALLOC)
         newlen *= 2;
     else
         newlen += SDS_MAX_PREALLOC;
    
     type = sdsReqType(newlen);
    
     /* Don't use type 5: the user is appending to the string and type 5 is
      * not able to remember empty space, so sdsMakeRoomFor() must be called
      * at every appending operation. */
     if (type == SDS_TYPE_5) type = SDS_TYPE_8;
    
     hdrlen = sdsHdrSize(type);
     if (oldtype==type) {
          
          
         newsh = s_realloc(sh, hdrlen+newlen+1);
         if (newsh == NULL) return NULL;
         s = (char*)newsh+hdrlen;
     } else {
          
          
         /* Since the header size changes, need to move the string forward,
          * and can't use realloc */
         newsh = s_malloc(hdrlen+newlen+1);
         if (newsh == NULL) return NULL;
         memcpy((char*)newsh+hdrlen, s, len+1);
         s_free(sh);
         s = (char*)newsh+hdrlen;
         s[-1] = type;
         sdssetlen(s, len);
     }
     sdssetalloc(s, newlen);
     return s;
    }
    
  2. The SDS scaling function is sds.c#sdsclear(), from the source code implementation, it mainly has the following operations, namelyIt does not release the actual memory occupied, which reflects a kind of lazy strategy

    1. Reset the len attribute value of the SDS header to 0
    2. Putting the terminator at the top of the buf array is equivalent to lazily deleting the content in buf
    /* Modify an sds string in-place to make it empty (zero length).
    * However all the existing buffer is not discarded but set as free space
    * so that next append operations will not require allocations up to the
    * number of bytes previously available. */
    void sdsclear(sds s) {
          
          
     sdssetlen(s, 0);
     s[0] = '\0';
    }
    

Guess you like

Origin blog.csdn.net/weixin_45505313/article/details/108292168