Application Series 1 of Redis High Availability and High Performance Cache - Data Types and Underlying Structure and Principles

overview

Introduce the principle and design execution process of redis cache, the single-threaded processing method is the reason for high efficiency, and explain the redis data type, underlying structure and principle, which is very helpful for us to use Redis.

Low-level running implementation model

redis_01.png

The client's request is first executed by running the kernel in linux, and the epoll non-blocking I/O multiplexing method is used between redis and the kernel to process the request. The I/O operation of the request will be stored in the pending queue of epoll in an orderly manner. Among them, Redis is a memory operation, and the speed of memory operation is much higher than that of I/O operations. Redis is a single-threaded processing method that is efficient. Redis will sequentially process one by one in the pending queue of epoll in the request.

The so-called Redis single thread is a worker thread used for computing, and redis will also have other threads, such as persistence, asynchronous deletion, and so on.

Redis basic structure

As long as the Redis core structure consists of redisServer and redisObject, when Redis is initialized, first initialize the RedisServer structure, map dictEntry through dict, and store specific types and values ​​in redisObject for unified management. If there is something you don’t understand, you can see redis design and realize a book.

redis_02.png

returnServer

1. The following is the structure initialization of redisServer, in the file server.h.

struct redisServer {
    /* General */
    pid_t pid;                  /* Main process pid. */
    pthread_t main_thread_id;         /* Main thread id */
    char *configfile;    /* Absolute config file path, or NULL */
    
    //...
    redisDb *db;
    dict *commands;             /* Command table */
    dict *orig_commands;        /* Command table before command renaming. */
    aeEventLoop *el;
    // ...
}

redisDb *db : saves the database information, initializes the default 16 databases, and can be modified through the parameter ``, dict is the core structure in redisDb, and all key-value pairs are stored in the dict, which is called the key space.

The understanding of dict is very important. Redis objects are equivalent to large dict structure objects. All type structures are implemented based on dict.

typedef struct redisDb {
    dict *dict;                
    dict *expires;              
    dict *blocking_keys;       
    dict *blocking_keys_unblock_on_nokey;  
    dict *ready_keys;           
    dict *watched_keys; 
    // ...
} redisDb;

Next, let’s look at the dict structure. It makes two dictionaries for progressive hashing, and the data group stores the specific dictEntry.

struct dict {
    dictType *type;
    dictEntry **ht_table[2];
    unsigned long ht_used[2];
    long rehashidx; 
    int16_t pauserehash; 
    signed char ht_size_exp[2];
    void *metadata[];           
};

struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;    
    void *metadata[];
};

dictEntry *next, when the Hash value is the same, a linked list is formed, pointing to the next dictEntry, and in the dictEntry structure, the key stores the actual key value, and the val points to the redisObject structure.

redisObject

The following is a detailed introduction to the main functions of the redisObject type and structure:

  • type: external data type, string, hash, list, set, zset, the type seen through the type command.
  • encoding: the type of data actually stored inside, the specific selection is realized through the corresponding parameters of the configuration file, for example, list is divided into ziplist and quicklist according to the stored value.
  • lru: manage cache elimination mechanism
  • refcount: reference counting, mainly used for memory recycling
  • ptr: physical pointer address to store real data
struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; 
    int refcount;
    void *ptr;
};

Redis data structure

String

The underlying implementation of Redis does not implement C language strings, but simply encapsulates a layer of dynamic string Sds, and Sds can be converted into int, embstr, and raw internally.

127.0.0.1:6379> set number 100
OK
127.0.0.1:6379> object encoding number
"int"

127.0.0.1:6379> set name stark张宇
OK
127.0.0.1:6379> object encoding name
"embstr"

127.0.0.1:6379> set logstr “stark张宇关于Redisraw类型编码的演示,他需要超过44字节”
OK
127.0.0.1:6379> object encoding logstr
"raw"

The internal conversion of Sds depends on the byte size of the stored value. If the value is int, it is directly stored as int, if it is less than or equal to 44 bytes, it is stored as embstr type, and if it is greater than 44 bytes, it is stored as raw type.

The difference between embstr and raw storage: embstr is stored in a continuous memory, which can be read once, while raw stores discontinuous memory, which needs to be read twice.

Why does Redis design the Sds structure to store strings?

1. Efficiency: The string length len is often used in variables. If the character length of C language is used, the entire string needs to be traversed. Its time complexity is O(n). The len in Sds is time-complex Degree O(1), in high concurrency scenarios, frequent traversal of strings will cause performance bottlenecks.

2. Prevent data overflow: Since the string in C language does not record its own length, it is safe to save the binary in Sds, which is convenient for modifying the storage space when the value is modified.

3. Pre-allocation of memory space: When modifying the string memory space, not only the string storage space will be modified, but also additional space will be reserved, and the unused memory space will be checked first when the next modification is made.

4. Lazy space release: When the modified string memory space becomes smaller, the memory space will not be reclaimed immediately to prevent further modification.

Rules for allocating memory space: When the value is less than 1M, the same space will be allocated, and when it is greater than 1M, 1M space will be allocated.

Hash

The bottom layer of Redis's Hash is a dict. When the amount of data is relatively small or the data value is relatively small, ziplist is used, and when the data is large, the structure of hashtable is used to store data.

127.0.0.1:6379> hgetall user1
1) "name"
2) "stark"
3) "age"
4) "33"
5) "sex"
6) "1"
127.0.0.1:6379> object encoding user1
"ziplist"

127.0.0.1:6379> hset user1 mark "stark张宇关于HashTable类型编码的演示,值很大他就变成了HashTable"
(integer) 1
127.0.0.1:6379> object encoding user1
"hashtable"

ziplist

Detailed explanation of the components of the ziplist structure:

  • zlbytes: 32-bit unsigned integer, indicating the space occupied by the entire ziplist, including 4 bytes occupied by zlbytes. This field can reset the size of the entire ziplist without traversing the entire list to determine the size, space for time.
  • zltail: 32-bit unsigned integer, indicating the offset of the last item in the entire list, which is convenient for pop operation at the end.
  • zllen: 16 bits, indicating the number of entries stored in the ziplist.
  • entry: variable length, there may be more than one.
  • zlend: 8 bits, indicated at the end of ziplist, its fixed value is 255.

The entry consists of 3 parts, the size of the previous entry, the current encoding type and length, real strings and numbers.

Advantages and disadvantages of ziplist:
Advantages: Because it is a continuous memory space, the utilization rate is high and the access efficiency is high.
Disadvantages: The update efficiency is low. When inserting and deleting an element, the memory will be expanded and reduced frequently, and the data movement efficiency is low.

list

The ordered data structure of Redis's list is divided into ziplist and quicklist at the bottom.

Ziplist has already been mentioned in the Hash type, so I won't make redundant descriptions here.

Advantages and disadvantages of the quicklist structure:

Advantages: Because it is a doubly linked list, the update efficiency is relatively high, and it is very convenient for inserting and deleting operations. The complexity is O(n), and the complexity of the front and rear elements is O(1).
Disadvantages: Increased memory overhead.

Set

The Set in Redis is an unordered, automatically deduplicated data type. Its underlying layer is a dict. When the data uses integers and the data elements are smaller than those in the configuration file, otherwise dict is used set-max-intest-entries.

127.0.0.1:6379> sadd gather 100 200 300
(integer) 3
127.0.0.1:6379> object encoding gather
"intset"
127.0.0.1:6379> sadd gather stark
(integer) 1
127.0.0.1:6379> object encoding gather
"hashtable"

Set

Redis's Zset is an ordered and automatically deduplicated data type. The bottom layer is implemented by the dictionary Dict and the skip table Skiplist. When there is less data, the Ziplist structure is used to store it.

zset-max-ziplist-entriesZiplist can be configured by and in the configuration file zset-max-ziplist-value.

Frequently Asked Questions: Why does Redis's Zset not use red-black trees and binary trees, but choose jump tables?

1) The range search of red-black tree and binary tree is not very good. The scene with a large ordered set is for range search. Range search is very convenient on the jump table because it is a linked list, red-black tree and binary tree. Range lookup is relatively more complicated. 2) The implementation of the jump table is much simpler than that of the red-black tree.

Guess you like

Origin blog.csdn.net/xuezhiwu001/article/details/130090223