Redis source code learning

Learning http://blog.csdn.net/men_wen/article/details/75668345

Simple dynamic string: string

http://blog.csdn.net/men_wen/article/details/69396550

sds structure:

struct sdshdr {
    int len;        //buf中已占用空间的长度
    int free;       //buf中剩余可用空间的长度
    char buf[];     //初始化sds分配的数据空间,而且是柔性数组(Flexible array member)
};

advantage:

1. Binary security: can store strings containing '\0' (C built-in string type terminator), such as binary pictures, videos, etc.

2、

Linked list structure: list doubly linked list

http://blog.csdn.net/men_wen/article/details/69215222

  • linked list node
typedef struct listNode {
    struct listNode *prev; //前驱节点,如果是list的头结点,则prev指向NULL
    struct listNode *next;//后继节点,如果是list尾部结点,则next指向NULL
    void *value;            //万能指针,能够存放任何信息
} 
  • header
typedef struct list {
    listNode *head;     //链表头结点指针
    listNode *tail;     //链表尾结点指针

    //下面的三个函数指针就像类中的成员函数一样
    void *(*dup)(void *ptr);    //复制链表节点保存的值
    void (*free)(void *ptr);    //释放链表节点保存的值
    int (*match)(void *ptr, void *key); //比较链表节点所保存的节点值和另一个输入的值是否相等
    unsigned long len;      //链表长度计数器
} list;

 

Use the list header to manage linked list information:

1. Head and tail pointers: The complexity of the operation on the head node and tail node of the linked list is O(1).

2. len linked list length counter: The complexity of obtaining the number of nodes in the linked list is O(1).

3, dup, free and match pointers: to achieve polymorphism , the linked list node listNode uses the universal pointer void * to save the value of the node, and the header list uses the dup, free and match pointers to implement different methods for different objects stored in the linked list .

  • linked list iterator

Redis dictionary structure: hash, associative array, map

Hash table structure

typedef struct dictht { //哈希表
    dictEntry **table;      //存放一个数组的地址,数组存放着哈希表节点dictEntry的地址。
    unsigned long size;     //哈希表table的大小,初始化大小为4
    unsigned long sizemask; //用于将哈希值映射到table的位置索引。它的值总是等于(size-1)。
    unsigned long used;     //记录哈希表已有的节点(键值对)数量。
} dictht;

Hash table node structure

typedef struct dictEntry {
    void *key;                  //key
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;                        //value
    struct dictEntry *next;     //指向下一个hash节点,用来解决hash键冲突(collision)
} dictEntry;

dictionary structure

typedef struct dict {
    dictType *type;     //指向dictType结构,dictType结构中包含自定义的函数,这些函数使得key和value能够存储任何类型的数据。
    void *privdata;     //私有数据,保存着dictType结构中函数的参数。
    dictht ht[2];       //两张哈希表。
    long rehashidx;     //rehash的标记,rehashidx==-1,表示没在进行rehash
    int iterators;      //正在迭代的迭代器数量
} dict;

The dictType type holds pointers to methods for manipulating different types of keys and values ​​in the dictionary.

typedef struct dictType {
    unsigned int (*hashFunction)(const void *key);      //计算hash值的函数
    void *(*keyDup)(void *privdata, const void *key);   //复制key的函数
    void *(*valDup)(void *privdata, const void *obj);   //复制value的函数
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);  //比较key的函数
    void (*keyDestructor)(void *privdata, void *key);   //销毁key的析构函数
    void (*valDestructor)(void *privdata, void *obj);   //销毁val的析构函数
} dictType;

hash algorithm

int integer hash value

unsigned int dictIntHashFunction(unsigned int key)      //用于计算int整型哈希值的哈希函数
{
    key += ~(key << 15);       //~按位取反 ,^按位异或
    key ^=  (key >> 10);
    key +=  (key << 3);
    key ^=  (key >> 6);
    key += ~(key << 11);
    key ^=  (key >> 16);
    return key;
}

When the dictionary is used as the underlying implementation of the database or as a hash key, redis uses the MurmurHash2 algorithm, which can generate 32-bit or 64-bit hash values

conflict&rehash

Redis skiplist (skiplist): ordered collection

The skip table supports average O(logN) and worst O(N) node lookups, and in most cases, the efficiency of the skip table is comparable to that of a balanced tree.

Redis integer set (intset)

An integer collection (intset) is one of the underlying implementations of collection keys. Another implementation of the set key is a hash table with an empty value. Although the hash table is used to add and delete elements to the set, determine whether the element exists, etc., the operation time complexity is O(1), but when the stored element is When the integer type has a small number of elements, if it is stored in a hash table, it will waste memory, so the integer set (intset) type exists because it saves memory.

Redis Compression List (ziplist)

A ziplist is one of the underlying implementations of hash keys. It is a specially coded doubly linked list, and like an integer set (intset), it is designed to improve memory storage efficiency. When the saved object is a small integer value, or a string with a short length, then redis will use the compressed list as the implementation of the hash key.

After redis 3.2, quicklist is one of the underlying implementations of list keys, instead of compressed lists.

Redis quicklist

127.0.0.1:6379> RPUSH list 1 2 5 1000
"redis" "quicklist"(integer) 
127.0.0.1:6379> OBJECT ENCODING list
"quicklist"

 

Redis Object System

Object structure robj function:

  • Provides the same representation for 5 different object types.
  • Different objects are suitable for different scenarios, and a variety of data structures are supported for the same object type.
  • Support reference counting and implement object sharing mechanism.
  • The access time of the object is recorded so that it is easy to delete the object.
#define LRU_BITS 24
#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */
#define LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */

typedef struct redisObject {
    //对象的数据类型,占4bits,共5种类型
    unsigned type:4;        
    //对象的编码类型,占4bits,共10种类型
    unsigned encoding:4;

    //least recently used
    //实用LRU算法计算相对server.lruclock的LRU时间
    unsigned lru:LRU_BITS; /* lru time (relative to server.lruclock) */

    //引用计数
    int refcount;

    //指向底层数据实现的指针
    void *ptr;
} robj;

//type的占5种类型:
/* Object types */
#define OBJ_STRING 0    //字符串对象
#define OBJ_LIST 1      //列表对象
#define OBJ_SET 2       //集合对象
#define OBJ_ZSET 3      //有序集合对象
#define OBJ_HASH 4      //哈希对象

/* Objects encoding. Some kind of objects like Strings and Hashes can be
 * internally represented in multiple ways. The 'encoding' field of the object
 * is set to one of this fields for this object. */
// encoding 的10种类型
#define OBJ_ENCODING_RAW 0     /* Raw representation */     //原始表示方式,字符串对象是简单动态字符串
#define OBJ_ENCODING_INT 1     /* Encoded as integer */         //long类型的整数
#define OBJ_ENCODING_HT 2      /* Encoded as hash table */      //字典
#define OBJ_ENCODING_ZIPMAP 3  /* Encoded as zipmap */          //不在使用
#define OBJ_ENCODING_LINKEDLIST 4 /* Encoded as regular linked list */  //双端链表,不在使用
#define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */         //压缩列表
#define OBJ_ENCODING_INTSET 6  /* Encoded as intset */          //整数集合
#define OBJ_ENCODING_SKIPLIST 7  /* Encoded as skiplist */      //跳跃表和字典
#define OBJ_ENCODING_EMBSTR 8  /* Embedded sds string encoding */   //embstr编码的简单动态字符串
#define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */   //由压缩列表组成的双向列表-->快速列表

 

Implementation of Redis string keys (t_string)

Redis list type command implementation (t_list)

1 BLPOP key1 [key2] timeout: Move out and get the first element of the list, if the list has no elements it will block the list until the wait times out or an element that can be popped is found.
2 BRPOP key1 [key2 ] timeout: remove and get the last element of the list, if the list has no elements, it will block the list until the wait times out or an element that can be popped is found.

Redis database

Redis is a key-value database server that saves all key-value pairs in the dict dictionary members in the redisDb structure

  • The key of the key-value pair dictionary is the key of the database, and each key is a string object.

  • The value of the key-value pair dictionary is the value of the database, and each value can be any of a string object, a list object, a hash table object, a set object, and an ordered set object.

typedef struct redisDb {
    // 键值对字典,保存数据库中所有的键值对
    dict *dict;                 /* The keyspace for this DB */
    // 过期字典,保存着设置过期的键和键的过期时间
    dict *expires;              /* Timeout of keys with a timeout set */
    // 保存着 所有造成客户端阻塞的键和被阻塞的客户端
    dict *blocking_keys;        /*Keys with clients waiting for data (BLPOP) */
    // 保存着 处于阻塞状态的键,value为NULL
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    // 事物模块,用于保存被WATCH命令所监控的键
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    // 当内存不足时,Redis会根据LRU算法回收一部分键所占的空间,而该eviction_pool是一个长为16数组,保存可能被回收的键
    // eviction_pool中所有键按照idle空转时间,从小到大排序,每次回收空转时间最长的键
    struct evictionPoolEntry *eviction_pool;    /* Eviction pool of keys */
    // 数据库ID
    int id;                     /* Database ID */
    // 键的平均过期时间
    long long avg_ttl;          /* Average TTL, just for stats */
} redisDb;

typedef struct client {
    redisDb *db;            /* Pointer to currently SELECTed DB. */
} client;

struct redisServer {
    redisDb *db;
    int dbnum;                      /* Total number of configured DBs */
};

Every time the database finds the value object according to the key name, it is divided into the read operation lookupKeyRead() or  the write operation lookupKeyWrite(), and there are certain differences between the two.

Redis notification function implementation and actual combat

Clients can  subscribe and publish (pub/sub) functions to receive events that change the Redis dataset in some way.

The subscription and publishing function of Redis adopts a fire and forget strategy . When the client subscribed to the event is disconnected, it will lose all events distributed to it during the disconnection period.

Notification function type:

  • key-space notification
  • key-event notification

Abstraction of Redis input and output (rio)

rio is an abstraction of Redis for IO operations, which can be oriented to different input and output devices, such as a buffer IO, file IO and socket IO.

A rio object provides the following four methods:

  • read: read operation
  • write: write operation
  • tell: offset for read and write
  • flush: flush buffer operation

In the abstract structure of rio, a union is used, which may be three different objects, namely:

  • Buffer IO (Buffer I/O)
  • Standard input and output IO (Stdio file pointer)
  • File descriptors set

The difference between standard IO and file IO:

    Standard IO: file stream fp, with cache, library function

    File IO: file descriptor fd (small, non-negative integer), no cache, system call

    Standard IO is dependent on file IO

b1: Standard IO:

     stdin keyboard/stdout screen/stderr screen

     fgetc reads one character at a time fputc writes one character at a time

     fgets reads one line at a time fputs writes one line at a time

     getchar

    fprintf formatted output to a file stream

     sprintf formatted output to cache

     fopen opens the file, establishes a file stream, and associates it with the file

     fclose closes the file stream

     fread reads the file stream directly

     fwrite writes directly to the file stream

     errno The reason for the output error (the result is the number)

     strerror converts the resulting number to a string and outputs the reason for the error   

     perror prints the reason for the error (commonly used)

     fflush forces the cache to be flushed

     stdout line buffer, flush only for /n, not for /r

     stderr is unbuffered

     fseek moves the current read and write position

     ftell View the current read and write position

     freopen redirects the file stream

b2: File IO:

     File descriptor allocation principle: the smallest int number that is not currently used, 0 (standard input) stdin, 1 (standard output) stdout, 2 (standard error) stderr

     open opens a file, associates the file with a file descriptor

     close close the file

     read reads the file, in bytes

     write write to file

     lseek locates the read and write position

b3: file descriptor value: there are multiple descriptors pointing to the file structure

    The member variables in the file structure point to the inode node (structure) of the file

    The file structure has a member variable "reference count" refc to record the number of file descriptors pointing to the file structure

b4: dup file descriptor copy

    duplicate a file descriptor

    There are multiple file descriptors pointing to the file structure

D: file type (7 types): everything is a file

     ls -l: View file types

     bcd-lsp (b: block file c: character file d: directory -: regular file l: link file s: socket p: pipe file)

Redis RDB persistence mechanism

Because Redis is an in-memory database, it stores data in memory. If the server process exits, the database state in the server will disappear. To solve this problem, Redis provides two persistence mechanisms: RDBandAOF . This article mainly analyzes the process of RDB persistence.

RDB persistence is the process of saving a point-in-time snapshot of the current process data to the hard disk to avoid accidental data loss.

AOF persistence: Record each write command in an independent log, and re-execute the commands in the AOF file when restarting to restore data.

Since it Redisis a single-threaded response command, each time AOFa file is written, it is directly appended to the hard disk, so the performance of writing depends entirely on the load of the hard disk, so the Rediscommand is written into the buffer, and then the file synchronization operation is performed, and then the Buffer contents are synchronized to disk, which maintains high performance.

Redis event handling implementation

Redis wraps the common ones select epoll evport kqueue. In the compilation phase, they select a multiplexing library with the highest performance according to different systems as the implementation of the multiplexing program of Redis, and the interface names implemented by all libraries are the same, so Redis The underlying implementation of the multiplexer is interchangeable.

INFO server 

 

 

Redis replication (replicate) implementation

1. Introduction to Replication

RedisIn order to solve the problem of single-point database, multiple copies of data will be deployed to other nodes. Through replication, Redishigh availability is achieved, redundant backup of data is realized , and high reliability of data and services is ensured .

2. Implementation of replication

 

 

high availability

Redis Sentinel Implementation

Raft algorithm:

http://thesecretlivesofdata.com/raft/

Redis Cluster cluster scaling principle

Redis failover process and principle

 

 

Redis optimization methods

Redis uses Hash storage to save memory

1. Use Redis's String structure to do a key-value store,

SET media:1155315 939   
GET media:1155315   
> 939

Store the data as above, 1,000,000 data will use up 70MB of memory, and 300,000,000 photos will use up 21GB of memory.

Store the key value as a pure number. After experiments, the memory usage will be reduced to 50MB, and the total memory usage will be 15GB.

2. Use Hash structure. The specific method is to segment the data, and use a Hash structure to store each segment. Since the Hash structure will be compressed and stored when a single Hash element is less than a certain number, it can save a lot of memory, so it can save a lot of memory. This does not exist in the String structure above. And this certain number is controlled by the hash-zipmap-max-entries parameter in the configuration file. After the developers' experiments, when the hash-zipmap-max-entries is set to 1000, the performance is better. After the HSET command exceeds 1000, the CPU consumption will become very large.

 

 

 

 

#include <stdio.h>
struct str{
    int len;
    char s[0];
};
 
struct foo {
    struct str *a;
};
 
int main(int argc, char** argv) {
    struct foo f={0};
    if (f.a->s) {
        printf( f.a->s);
    }
    return 0;
}

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326846772&siteId=291194637