An article analyzes how Redis' data structure and object system are designed?

Redis is an open source key-value storage system. It uses six low-level data structures to build an object system containing string objects, list objects, hash objects, collection objects, and ordered collection objects. Today we will take a look at 12 pictures to fully understand its data structure and the realization principle of the object system.

The content of this article is as follows :

  • First introduce six basic data structures: dynamic strings, linked lists, dictionaries, skip lists, integer sets, and compressed lists.

  • Secondly, we introduce the string object (String), list object (List), hash object (Hash), set object (Set) and ordered set object (ZSet) in Redis' object system.

  • Finally, we introduce Redis' key space and expire key (expire) implementation.

data structure

1. Simple dynamic strings

Redis uses dynamic string SDS to represent string values. The following figure shows an SDS structure with a value of Redis:

  • len: indicates the true length of the string (excluding the NULL terminator).

  • alloc: indicates the maximum capacity of the string (not including the last extra byte).

  • flags: always occupy one byte. The lowest 3 bits are used to indicate the type of header.

  • buf: character array.

The structure of SDS can reduce the number of memory reallocations caused by modifying the string, which depends on the two mechanisms of memory pre-allocation and lazy space release.

When the SDS needs to be modified and the SDS needs to be expanded, Redis will not only allocate the space necessary for the modification to the SDS, but also allocate additional unused space for the SDS .

  • If the length of the SDS (that is, the value of the len attribute) is less than 1MB after modification, Redis preallocates unused space of the same size as the len attribute.

  • If the length of the SDS is greater than 1MB after modification, Redis will allocate 1MB of unused space.

For example, after the modification, the len length of SDS is 20 bytes, less than 1MB, then Redis will pre-allocate 20 bytes of space, and the actual length of the SDS buf array (excluding the last byte) becomes 20 + 20 = 40 bytes. When the len length of SDS is greater than 1MB, only 1MB of space will be allocated.

Similarly, when SDS shortens the length of the string it saves, it does not immediately release the extra bytes, but waits for later use.

2. Linked list

Linked lists are widely used in Redis. For example, one of the underlying implementations of list objects is linked lists. In addition to linked list objects, functions such as publish and subscribe, slow queries, and monitors also use linked lists.

The linked list of Redis is a doubly linked list, the schematic diagram is shown above. The linked list is the most common data structure, so I won't go into details here.

The dup, free, and match member attributes of Redis' linked list structure are the type-specific functions needed to implement polymorphic linked lists:

  • The dup function is used to copy the value stored in the linked list node for deep copy.

  • The free function is used to release the value saved by the linked list node.

  • The match function is used to compare whether the value stored in the linked list node is equal to another input value.

3. Dictionary

Dictionaries are widely used to implement various Redis functions, including key space and hash objects. The schematic diagram is shown below.

Redis uses the MurmurHash2 algorithm to calculate the hash value of the key, and uses the chain address method to solve the key conflict. Multiple key-value pairs assigned to the same index will be connected into a one-way linked list.

4. Jump table

Redis uses jump tables as one of the underlying implementations of ordered collection objects. It saves elements in a hierarchical linked list in an orderly manner. The efficiency is comparable to that of a balanced tree-operations such as search, delete, and add can be completed in logarithmic expected time, and compared to balanced trees, the jump table The implementation is much simpler and more intuitive.

The schematic diagram of the jump table is shown in the above figure. Here, it only briefly talks about its core idea, and does not explain it in detail.

As shown in the schematic diagram, zskiplistNode is a node of the jump list, its ele is the retained element value, score is the score, the nodes are arranged in order according to its score value, and the level array is the embodiment of its so-called hierarchical linked list.

The size of the level array of each node is different. The value in the level array is the pointer to the next node and the span value. The span value is the difference between the scores of the two nodes. The higher the level array value, the larger the span value, and the lower level array value, the smaller the span value.

The level array is like a ruler with different scales. When measuring the length, first use the large scale to estimate the range, and then continue to use the reduced scale to accurately approximate.

When querying an element value in the jump table, start with the top level of the first node. For example, when querying the o2 element in the jump table above, start with the o1 node first, because the header pointer of zskiplist points to it.

Start by querying from its level [3] and find that its span is 2, and the score of the o1 node is 1.0, so the sum is 3.0, which is greater than the score value of o2 of 2.0. Therefore, we can know that the o2 node is between the o1 and o3 nodes. At this time, the ruler with a small scale was used instead. Use the pointer of level [1] to find the o2 node successfully.

5. Set of integers

Inteset collection is one of the underlying implementations of collection objects. When a collection contains only integer-valued elements and the number of elements in this collection is not large, Redis will use integer collections as the underlying implementation of collection objects.

As shown in the figure above, the encoding of the integer set indicates its type, int16t, int32t or int64_t. Each element is an array item in the contents array, and each item is arranged in order from the smallest to the largest in the array, and the array does not contain any duplicate items. The length attribute is the number of elements contained in the integer set.

6. Compressed list

The compressed queue ziplist is one of the underlying implementations of list objects and hash objects. When certain conditions are met, both the list object and the hash object are implemented with the compressed queue as the bottom layer.

The compressed queue is developed by Redis to save memory, and is a sequential data structure composed of a series of specially encoded continuous memory blocks. Its attribute values ​​are:

  • zlbytes: The length is 4 bytes, which records the memory bytes of the entire compressed array.

  • zltail: The length is 4 bytes, and record the number of bytes from the start address of the compression queue to the end of the compression queue. This attribute can be used to directly determine the address of the tail node.

  • zllen: The length is 2 bytes, including the number of nodes. When the attribute value is less than INT16_MAX, the value is the total number of nodes, otherwise it is necessary to traverse the entire queue to determine the total number.

  • zlend: The length is 1 byte, a special value, used to mark the end of the compression queue.

Each node entry in the middle consists of three parts:

  • previous_entry_length: the length of the previous node in the compressed list, and perform pointer calculation with the current address to calculate the starting address of the previous node.

  • encoding: the type and length of data saved by the node

  • content: The node value, which can be a byte array or an integer.

Object

6 types of low-level data structures are introduced above. Redis does not directly use these data structures to implement key-value databases, but creates an object system based on these data structures. This system includes string objects, list objects, hash objects, and collections. Objects and ordered collections of these five types of objects, each object uses at least one of the underlying data structures mentioned above.

Redis determines which data structure the object uses based on different usage scenarios and content sizes, thereby optimizing the efficiency and memory footprint of the object in different scenarios.

The definition of Redis RedisObject structure is shown below.

typedef struct redisObject {

    unsigned type:4;

    unsigned encoding:4;

    unsigned lru:LRU_BITS; 

    int refcount;

    void *ptr;

} robj;
复制代码

Where type is the object type, including REDISSTRING, REDISLIST, REDISHASH, REDISSET and REDIS_ZSET.

Encoding refers to the data structure used by the object, the complete set is as follows.

1. String object

We first look at the implementation of string objects, as shown in the following figure.

If a string object saves a string value, and the length is greater than 32 bytes, then the string object will be saved using SDS, and the object encoding is set to raw, as shown in the upper part of the figure. If the length of the string is less than 32 bytes, the string object will be saved using embstr encoding.

The embstr encoding is an optimized encoding method specially used to save short strings. The composition of this encoding is the same as the raw encoding. Both use the redisObject structure and the sdshdr structure to store the strings, as shown in the lower part of the above figure.

But raw encoding will call two memory allocations to create the above two structures separately, while embstr allocates a continuous space through one memory allocation, and the space contains two structures at a time.

embstr only needs one memory allocation, and in the same contiguous memory, better use of the advantages brought by the cache, but embstr is read-only and cannot be modified. When an embstr-encoded string object is appended, Redis will now convert it to raw encoding before proceeding.

2. List objects

The encoding of the list object can be ziplist or linkedlist. The schematic diagram is shown below.

When the list object can meet the following two conditions at the same time, the list object uses ziplist encoding:

  • The length of all string elements stored in the list object is less than 64 bytes.

  • The number of elements saved in the list object is less than 512.

List objects that cannot meet these two conditions need to be linkedlist coded or converted to linkedlist code.

3. Hash object

The encoding of the hash object can use ziplist or dict. The schematic diagram is shown below.

When the hash object uses the compression queue as the underlying implementation, the program inserts the key-value pairs into the compression queue next to each other, with the node holding the key first and the node holding the value second. As shown in the upper part of the figure below, the hash has two key-value pairs, namely name: Tom and age: 25.

When the hash object can meet the following two conditions at the same time, the hash object is encoded using ziplist:

  • The key and value string length of all key-value pairs stored in the hash object is less than 64 bytes.

  • The number of key-value pairs stored in the hash object is less than 512.

Hash objects that cannot meet these two conditions need to use dict encoding or be converted to dict encoding.

4. Collection objects

Set object encoding can use intset or dict.

The intset-encoded collection object uses the integer set as the bottom layer implementation, and all elements are stored in the integer set.

When encoding with dict, each key of the dictionary is a string object, and each string object is a set element, and the values ​​of the dictionary are all set to NULL. As shown below.

When the collection object can meet the following two conditions at the same time, the object is encoded using intset:

  • All elements stored in the collection object are integer values.

  • The number of elements stored in the collection object does not exceed 512.

Otherwise use dict for encoding.

5. Ordered collection objects

The encoding of an ordered set can be ziplist or skiplist.

When an ordered collection is encoded with ziplist, each collection element is represented by two compressed list nodes next to each other. The first node is the value of the element, and the second node is the score of the element, which is the value of the sort comparison.

The collection elements in the compressed list are sorted according to the score from small to large, as shown in the upper part of the figure below.

Ordered sets are encoded using skiplist using the zset structure as the underlying implementation. A zet structure contains both a dictionary and a jump table.

Among them, the jump table saves all elements according to the score from small to large, each jump table node saves an element, and its score value is the score of the element. The dictionary creates a mapping from members to scores. The keys of the dictionary are the values ​​of the members of the collection, and the values ​​of the dictionary are the scores of the members of the collection. The dictionary can be used to find the score of a given member at O ​​(1) complexity. As shown below.

The collection element value objects in the jump table and the dictionary are shared, so no additional memory consumption.

When the ordered set object can meet the following two conditions at the same time, the object is encoded using ziplist:

  • The number of elements stored in an ordered set is less than 128;

  • The length of all elements stored in an ordered set is less than 64 bytes.

Otherwise, use skiplist encoding.

6. Database key space

The Redis server has multiple Redis databases, and each Redis data has its own independent key value space. Each Redis database uses dict to store all key-value pairs in the database.

The key of the key space is the key of the database. Each key is a string object, and the value object may be one of a string object, a list object, a hash table object, a collection object, and an ordered collection object.

In addition to the key space, Redis also uses a dict structure to save the expiration time of the key. The key is the key value in the key space, and the value is the expiration time, as shown in the above figure.

Through the expired dictionary, Redis can directly determine whether a key is expired. First, check whether the key exists in the expired dictionary. If it exists, compare the key's expiration time with the current server timestamp. .

Original author: road technologies Zhang dog eggs

Reprinted from: WeChat public account

Original link: mp.weixin.qq.com/s/gQnuynv6X ...


Make progress together, learn and share

Everyone is welcome to pay attention to my public account [the wind and waves are quiet and quiet ], a large number of Java related articles, learning materials will be updated in it, and the collated information will also be placed in it.

If you think it's good to write, just like it and add attention! Pay attention, don't get lost, keep updating! ! !

Guess you like

Origin juejin.im/post/5e9c56e4f265da47b35c7d5f