redis01-The essence of high-performance redis

The company encountered a redis cluster failure some time ago, causing most applications to hang up. This was because it was deployed on k8s and elk was not added for container log collection. In addition, I do not understand the cluster mechanism of redis and have not found the reason for the cluster crash. Therefore, I deliberately looked for better articles to understand the redis cluster mechanism. I found a blog in the Blog Park that I think talks better and recommend it to everyone. Original address: https://www.cnblogs.com/wzh2010/p/15886799.html

1. redis high performance analysis

The reason why Redis can provide ultra-high execution efficiency is mainly achieved from the following dimensions:

  1. Storage model: based on memory implementation, not disk
  2. Data structure: efficient data structure based on different business scenarios
  • Dynamic string (REDIS_STRING): integer (REDIS_ENCODING_INT), string (REDIS_ENCODING_RAW)
  • Double-ended list (REDIS_ENCODING_LINKEDLIST)
  • Compressed list (REDIS_ENCODING_ZIPLIST)
  • Skipping table (REDIS_ENCODING_SKIPLIST)
  • Hash table (REDIS_HASH)
  • Integer set (REDIS_ENCODING_INTSET)
  1. Threading model: Redis's network IO and key-value pair instruction reading and writing are executed by a single thread, avoiding unnecessary context switches and campaigns.
  2. I/O model: Based on I/O multiplexing model, non-blocking I/O model
  3. Qiadan data encoding: Choose a reasonable data encoding based on the actual data type

1.1. Memory-based implementation

Redis's read and write operations are all implemented in memory. Compared with other persistent storage (such as MySQL, File, etc., data is persisted on disk), the performance is much higher. Because when we operate data, we need to read the data into the memory through IO operations first, which increases the work cost.
Insert image description here

  • Register: 0.3 ns
  • L1 cache: 0.9 ns
  • L2 cache: 2.8 ns
  • L3 cache: 12.9 ns
  • Main memory: 120 ns
  • Local secondary storage (SSD): 50~150 us
  • Remote secondary storage: 30 ms
    . This may not be intuitive. Let’s compare L1 and SSD. If L1 takes 1 second, SSD takes about 15 to 45 hours.
    Because the memory controller is integrated inside the CPU, the CPU directly controls the memory to provide optimal bandwidth for communication.

1.2. Efficient data structures based on different business scenarios

In Redis cache, there are five main commonly used data types, as follows:

  1. String/REDIS_STRING: Suitable for caching, counting, shared Session, IP statistics, distributed locks, etc.
  2. List/REDIS_LIST: Linked list, message queue, stack, ordered list of objects (such as the likes order list and comment order list in Moments).
  3. Hash table/REDIS_HASH: Shopping cart information, user information, Hash type (key, field, value) storage objects, etc.
  4. Set/REDIS_SET: Unordered unique key-value structure: collection of friends, followers, fans, interested people, etc.
  5. Ordered set/REDIS_ZSET: access rankings, likes rankings, number of fans rankings, etc.

These five data types are supported by one or more data structures, and there are 7 underlying data structures. The relationship is as follows:
Insert image description here

1.2.1.SDS simple dynamic string

Redis uses simple dynamic string (SDS) to represent strings. The data structures included in the string type in Redis are: integer (R_INT) and string (R_RAW). Let's take a string as an example. For a regular string, such as "Brand", if you want to get its length, you need to traverse it from the beginning until you encounter \0. The null character represents the end, such as a C string.

The comparison chart between C string structure and SDS string structure is as follows:
Insert image description here

  • The value of the free attribute is 0, indicating that this SDS does not allocate any unused space.
  • The value of the len attribute is 5, which means that this SDS stores a 5-byte string.
  • buf is a char type array that stores real strings. The first five bytes of the array store the five characters 'B', 'r', 'a', 'n', and 'd' respectively, and the last The byte stores the null character '\0', which represents the end.

Note: SDS follows the convention of C strings and ends with a null character. The 1 byte that holds the null character is not counted in the len attribute of SDS.

Compared with C strings, SDS has the following advantages:

  1. The time complexity of getting the string length is O(1).
    The C string does not record its own length. When obtaining the length of the C string, it must traverse the entire string count. The complexity is O(N). The
    SDS string itself records and maintains the len length attribute. The complexity of obtaining the length of the SDS string is O( 1)

  2. Avoid buffer overflows.
    C strings do not record the length. Since the two C strings are closely adjacent in memory storage, if sufficient space is not allocated in advance when performing string splicing strcat, the modified data of s1 may overflow into the space where s2 is located (buffer buffer). area overflow).
    SDS eliminates buffer overflow problems. It records the length. Before modifying the SDS string, the API will check whether the SDS space meets the modification requirements. If not, the API will automatically expand the space.

  3. Space pre-allocation reduces the number of memory reallocations during modification
    . After SDS is modified, the program will not only allocate the required space for SDS, but also allocate additional unused space. In this way, Redis can reduce the number of memory reallocations required to perform consecutive string growth operations.

  • If the modified length len is less than 1MB, allocate unused space of the same size as the len attribute: free=len.
  • If the modified length len is greater than or equal to 1MB, 1M of unused space will be allocated: free=1MB.
  1. Lazy space release
    Lazy space release when shortening operations: SDS avoids the memory reallocation operations required when shortening strings and provides optimization for possible future growth operations. When SDS performs a shortening operation, memory reallocation will not be used immediately to recover the extra bytes after shortening, but will remain in the free attribute. If an append operation is needed in the future, the unused space in free will be used directly, reducing the memory allocation steps.
    In addition, SDS also provides APIs to manually release unused SDS space to avoid memory waste caused by lazy release strategies.

  2. The characters of a binary safe
    C string must conform to a certain encoding. Except for the trailing null character, empty strings are not allowed inside the string, and there are storage limitations. In Redis, not only String type data can be stored, but some binary data may also be stored.
    Binary data is not in a regular string format and may contain some special characters such as '\0'. When encountering '\0' in C, it indicates the end of the string, but SDS does not. It ends with the len length identifier.

  3. Compatible with some C string functions.
    Although SDS is binary safe, it still adheres to the null-terminated nature of C strings. Many functions are consistent with C strings and do not need to be rewritten.

1.2.2.zipList compressed list

From the above data structure diagram, it can be seen that the compressed list is one of the underlying implementations of the three data types: List, Hash, and Set.
When the amount of data in our list is relatively small, and the stored data is lightweight (such as small integer values, short strings), Redis will perform the underlying implementation by compressing the list.
ziplist is a sequential data structure composed of a series of specially encoded consecutive memory blocks. There are three fields zlbytes, zltail and zllen at the head of the list. There are multiple entries in the list, and there is also a zlend at the end of the table. Let's dismantle it in detail. Down:

zlbytes: indicates the number of bytes occupied by the list
zltail: the offset of the end of the list
zllen: the offset of the end of the list: the number of entries in the list entry
: storage area, can contain multiple nodes, each node can store integers or characters string.
zlend: indicates the end of the list.
Insert image description here
If you want to locate the first element or the last element, you can quickly obtain it through the header zlbytes and zltail_offset elements, and the complexity is O(1). But when searching for other elements, it is not so efficient. You can only search one by one. For example, the complexity of entry n is O(N).

1.2.3.linklist double-ended list

The Redis List data type is often used in scenarios such as linked lists, message queues, stacks, and ordered object lists (such as the likes order list in the circle of friends, the comment order list, and the follow timeline), whether it is a queue (first in, first out), or Both stack (first in, last out) and double-ended lists are well supported.
The features of Redis’s linked list implementation can be summarized as follows:

  • Double-ended: The linked list nodes have prev and next pointers. The complexity of getting the previous node and the next node of a node is O(1).
  • Loop-free: The prev pointer of the head node and the next pointer of the tail node both point to NULL, and access to the linked list ends with NULL.
  • Head pointer/tail pointer: The complexity of obtaining the head node and tail node of the linked list through the head pointer and tail pointer of the list structure is O(1).
  • Linked list length counter: Count the linked list nodes of the list through the len attribute of the list structure. The complexity of obtaining the number of nodes is O(1).
  • Polymorphism: Linked list nodes use void* pointers to save node values, and set type-specific functions for node values ​​through the three attributes of dup, free, and match of the list structure, so linked lists can be used to save various types of values.
    The additional space for using a linked list is relatively high, because the pointer in a 64-bit system is 8 bytes, so the prev and next pointers need to occupy 16 bytes, and the linked list nodes are allocated separately in the memory, which will aggravate the fragmentation of the memory and affect the memory. Management efficiency.
    Taking into account the above shortcomings of linked lists, subsequent versions of Redis transformed the list data structure and used quicklist instead of ziplist and linkedlist. As a mixture of ziplist and linkedlist, it divides linkedlist into segments, uses ziplist for compact storage in each segment, and uses bidirectional pointers to connect multiple ziplists together.
    Insert image description here

1.2.4.Hhash dictionary

Regardless of the type (string, list, hash, set, zset), Redis stores key-value pairs in the form of a Hash structure. The whole is an array, and each element in the array is an independent object, called a hash bucket. For example, 1 ~ n in the figure, the corresponding entry holds a pointer to the actual specific value.
Insert image description here
The global hash table in the figure above has a time complexity of O(1). You only need to calculate the hash value of each key to know the location of the corresponding hash bucket, locate the entry in the bucket, and find the corresponding data. . This execution efficiency is very high.
In order to resolve possible conflicts, chained hashing is used, that is, elements in the same bucket are stored in a linked list.

1.2.5.intset integer set

If your collection only has integer-valued elements and the number is lightweight, Redis will use an integer collection as the underlying data structure of the Redis collection.

  • encoding: encoding method
  • length: The number of elements in the array, which is the overall length of the array
  • contents[]: A collection of integers, each element of the collection is an array item (item) of the array. Has the following characteristics:
    arranged in increasing order of value
    and does not contain any duplicates

1.2.6.skipList skip list

Skiplist (skip list) is an ordered data structure, so it is also one of the ZSet data types. By maintaining multiple pointers to other nodes in each node, it achieves the goal of rapid positioning.
The average node search in the jump table has an average time complexity of O(logN) and a difference time complexity of O(N). Nodes can also be processed in batches through sequential operations. The jump list is an improvement based on the linked list. On its basis, a multi-level index is added, and the real data item is finally located through continuous jumps through the index. Does this method remind everyone of b+tree? The concept is somewhat close, as shown in the figure below:
Insert image description here

1.3.Single-threaded model

The single thread of Redis mainly means that Redis's network IO and key-value pair reading and writing are completed by one thread. When Redis processes the client's request, it includes acquisition (socket reading), parsing, execution, content return (socket writing), etc. All are processed by a sequential main thread, which is the so-called "single thread". This is also the main process for Redis to provide external key-value storage services.
But other functions of Redis, such as persistence, asynchronous deletion, cluster data synchronization, etc., are actually executed by additional threads. It can be said that the Redis worker thread is single-threaded. However, the entire Redis is multi-threaded.

1.3.1. Why single thread?

So what are the main reasons for using a single thread in the main process?

  • The overall throughput is reduced.
    Appropriate expansion of threads is to effectively utilize the performance of the CPU and allow it to achieve an optimal utilization value with the memory. However, frequent Redis reading and writing, if threads are not effectively managed, will not only not improve the throughput of the system, but may also lead to a decrease.
  • CPU context switching
    When running a task, the CPU needs to load the task into the CPU register for calculation. When switching to other threads, the current context needs to be stored in the system kernel so that it can be loaded again when the calculation is re-executed later.
    Just like when you are concentrating on one thing, you frequently switch and are frequently interrupted. This cost is very high.
    Insert image description here
    When switching context, we need to complete a series of work, save context, switch, restore context, etc. The more frequent this operation is, the more resources it consumes.
  • The problem of concurrency control of shared resources
    introduces uncertainty in the execution order of the program, brings about a series of problems in concurrent reading and writing, and increases the complexity of the system. At the same time, there may be performance losses caused by thread switching, even locking and unlocking, and deadlocks.
  • Memory is the core focus.
    For the Redis framework, the main performance bottleneck is memory or network bandwidth, not the CPU.

1.3.2. Benefits of single thread

  1. Avoid performance consumption caused by creating too many threads, which in turn reduces overall throughput.
  2. Avoid extra CPU overhead caused by context switching.
  3. It avoids competition issues between threads, such as locking, unlocking, deadlock, etc., which will cause performance losses.
  4. There is no need to consider the program complexity caused by multi-threading, the code is clearer, and the processing logic is simple.
    Whether a single thread effectively utilizes the CPU
    ? Redis is a completely pure memory operation, the execution speed is very fast, and the CPU is usually not the bottleneck because most requests will not be CPU-intensive.
    The real performance bottleneck of Redis is network IO, which is the network transmission delay between the client and the server. Therefore, Redis chose single-threaded IO multiplexing to implement its core network model.

1.4.I/O multiplexing model

There are four common I/O models in server-side network programming: synchronous blocking IO (Blocking IO), synchronous non-blocking IO (Non-blocking IO), IO multiplexing (IO Multiplexing), and asynchronous IO (Asynchronous IO).
Redis uses I/O multiplexing technology to process connections concurrently. Its multiplexing program functions include select, poll, epoll, and kqueue. Take the epoll (currently new and good multiplexing technology) function as an example. When the client executes read, write, accept, close and other operation commands, it will encapsulate the commands into events, and then use epoll to multiplex Multiplexing feature to avoid I/O blocking.
Let's take a look at the difference between the ordinary I/O model and Redis' I/O multiplexing model to analyze how to maintain efficient execution under high-frequency Redis requests.

1.4.1. Common I/O model

Let’s first look at how the traditional blocking I/O model works: when using read or write to read or write a certain file descriptor (File Descriptor: FD), if the current FD is not readable or writable, the entire Redis The service will not respond to other operations, causing the entire service to become unavailable.
This is the traditional sense, that is, we use multiple blocking models in programming:
Insert image description here
Although the blocking model is very common in development and very easy to understand, because it will affect other FD corresponding services, it is necessary to handle multiple clients. When working on end tasks, the blocking model is often not used.

1.4.2.I/O multiplexing

Insert image description here
多路Reuse refers to: multiple socket connections reusing one thread. In this mode, the kernel does not monitor application connections, but monitors file descriptors.
When the client initiates a request, sockets of different event types will be generated. On the server side, because I/O multiplexing technology is used, it is not a blocking synchronous execution, but puts the message into the socket queue (refer to the I/O Multiplexing module in the figure below), and then passes the File event Dispatcher Forward it to different event handlers, such as accept, read, and send.
Insert image description here
To sum up, we get the following characteristics:

  • In single-threaded mode, the kernel continues to monitor connections and data requests on the socket, and once monitored, is handed over to the Redis thread for processing, achieving the effect of a single thread processing multiple I/O streams.
  • epoll provides an event-based callback mechanism. Different events call corresponding event handlers. Redis can continuously and efficiently process events, and performance can be improved simultaneously.
  • Redis does not block requests initiated by any client, so it can connect to multiple clients at the same time and process requests, and has the ability to execute concurrently.

2.3 Summary of high-performance Redis

  • Based on memory implementation, rather than disk, most of them are simple access operations, and resources are mainly consumed on IO, so the reading speed is fast.
  • Data structure: efficient data structure based on different business scenarios
  1. Dynamic string (REDIS_STRING): integer (REDIS_ENCODING_INT), string (REDIS_ENCODING_RAW)
  2. Double-ended list (REDIS_ENCODING_LINKEDLIST)
  3. Compressed list (REDIS_ENCODING_ZIPLIST)
  4. Skipping table (REDIS_ENCODING_SKIPLIST)
  5. Hash table (REDIS_HASH)
  6. Integer set (REDIS_ENCODING_INTSET)
  • Threading model: Redis's network IO and key-value pair instruction reading and writing are executed by a single thread, avoiding unnecessary contextswitch and election.
  • I/O model: Based on I/O multiplexing model, non-blocking I/O model
  • Qiadan data encoding: Choose a reasonable data encoding based on the actual data type
  • Redis itself is a global hash table, and its time complexity is O(1). In addition, in order to prevent hash conflicts from causing the linked list to be too long, rehash operations are performed to expand and reduce hash conflicts.

Guess you like

Origin blog.csdn.net/d495435207/article/details/131355975