redis_ master-slave, sentinel, cluster principle detailed explanation

1 Introduction

Hello everyone, I am the little boy picking up snails. Today, I will learn Redis master-slave, sentinel, and Redis Cluster with my friends.
Redis Master-
Slave Redis Sentinel
Redis Cluster Cluster

1. Redis master-slave

Interviewers often ask about the high availability of Redis. The Redis high-availability answer includes two levels. One is that data cannot be lost, or minimize loss; the other is to ensure that Redis services are not interrupted.
For minimizing data loss, it can be guaranteed by AOF and RDB.
To ensure that the service is not interrupted, Redis cannot be deployed at a single point. At this time, let's look at the Redis master-slave first.

1.1 Redsi master-slave concept

The Redis master-slave mode is to deploy multiple Redis servers, with a master library and a slave library, and master-slave replication between them to ensure the consistency of data copies.
The read-write separation method is adopted between the master and slave libraries, in which the master library is responsible for read and write operations, and the slave library is responsible for read operations.
If the Redis main library is down, switch the slave library to become the main library.

1.2 Redis master-slave synchronization process

insert image description here

Redis master-slave synchronization includes three stages.

The first stage: establish a connection between the master and slave libraries, and negotiate synchronization.
The slave library sends the psync command to the master library, telling it to perform data synchronization.
After the main library receives the psync command, it responds to the FULLRESYNC command (it indicates that the first copy is a full copy), and brings the main library runID and the current copy progress offset of the main library.

The second stage: the master library synchronizes the data to the slave library, and after the slave library receives the data, it completes the local loading.
The master library executes the bgsave command to generate an RDB file, and then sends the file to the slave library. After receiving the RDB file from the library, the current database will be cleared first, and then the RDB file will be loaded.
During the process of synchronizing data from the master database to the slave database, new write operations will be recorded in the replication buffer.

In the third stage, the master library sends the newly written command to the slave library.
After the master library finishes sending the RDB, it will send the modification operations in the replication buffer to the slave library, and the slave library will re-execute these operations. In this way, the master-slave library will be synchronized.

1.3 Some Notes of Redis Master-Slave

1.3.1 Master-slave data inconsistency

Because the master-slave replication is performed asynchronously, if the execution of the slave library is delayed, it will lead to inconsistency of the master-slave data.

There are generally two reasons for master-slave data inconsistency:
master-slave library network delay.
The slave library has received master-slave commands, but it is executing blocking commands (such as hgetall, etc.).

How to solve the problem of master-slave data inconsistency?
You can change to a better hardware configuration to ensure smooth network.
Monitor the progress of replication between the master-slave library

1.3.2 Read expired data

Redis has several strategies for deleting data:
Lazy deletion: Only when a key is accessed, will it be judged whether the key has expired, and it will be cleared when it expires.
Periodic deletion: At regular intervals, a certain number of keys in the expires dictionary of a certain number of databases will be scanned, and the expired keys will be cleared.
Active deletion: When the currently used memory exceeds the maximum limit, an active cleanup strategy is triggered.

If the Redis version is lower than 3.2, when reading from the library, it will not judge whether the data is expired, but will return the expired data. After version 3.2, Redis has made improvements. If the read data has expired, it will not be deleted from the library, but will return a null value, preventing the client from reading expired data.
Therefore, in the master-slave Redis mode, try to use Redis versions above 3.2.

1.3.3 One master and multiple slaves, master database pressure problem during full replication

If it is a master-multiple-slave mode, when there are many slave libraries, if each slave library has to be fully replicated with the master library, the pressure on the master library will be great. Because the main library fork process generates RDB, this fork process will block the main thread to process normal requests. At the same time, transferring large RDB files will also occupy the network bandwidth of the main library.

It can be solved using the master-slave-slave pattern. What is the master-slave mode? In fact, when deploying a master-slave cluster, select a slave library with a better hardware network configuration, and let it establish a master-slave relationship with some slave libraries. As shown in the picture:
insert image description here

1.3.4 What should I do if the master-slave network is broken?

After the master-slave library completes the full copy, a long network connection will be maintained between them, which will be used by the master library to transmit subsequent write commands to the slave library, which can avoid the overhead of frequently establishing connections. However, if the network is disconnected and reconnected, is it necessary to perform a full copy?

If it is before Redis 2.8, after the slave library and the main library are reconnected, a full copy will indeed be performed again, but this will cost a lot. After Redis 2.8, it has been optimized, and the incremental replication method is adopted after reconnection, that is, the write command received by the master library during the disconnection of the master-slave library network is synchronized to the slave library.

After the master-slave database is reconnected, repl_backlog_buffer is used to realize incremental replication.

When the master-slave library is disconnected, the master library will write the write operation commands received during the disconnection period into the replication buffer, and also write these operation commands into the repl_backlog_buffer buffer. repl_backlog_buffer is a ring buffer. The main library will record the location it has written to, and the slave library will record the location it has read.

2. Redis Sentry

In the master-slave mode, once the master node cannot provide services due to a failure, it is necessary to manually promote the slave node to the master node, and at the same time notify the application to update the address of the master node. Obviously, most business scenarios cannot accept this fault handling method. Redis has officially provided the Redis sentinel mechanism since 2.8 to solve this problem.

The role of the sentinel
Introduction to the sentinel mode
How does the sentinel determine that the master library is offline
How does the sentinel mode work
How does the sentinel select the master
Which sentinel performs the master-slave switch?
Failover under Sentinel

2.1 Sentinel role

Sentinel is actually a Redis process running in a special mode. It has three functions, namely: monitoring, automatic master selection switching (abbreviated as master selection), and notification.

During the running of the sentinel process, it monitors all Redis master nodes and slave nodes. It detects whether the master-slave library is down by periodically sending PING commands to the master-slave library. If the slave library does not respond to the sentinel's PING command within the specified time, the sentinel will mark it as offline; if the main library does not respond to the sentinel's PING command within the specified time, the sentinel will determine that the main library is offline, and then start Switch to the main task.

The so-called master selection is actually to select one of multiple slave databases as the master database according to certain rules. As for the notification, after the master library is selected, the sentinel sends the connection information of the new master library to other slave libraries, so that they can establish a master-slave relationship with the new master library. At the same time, Sentinel will also notify the client of the connection information of the new main library, so that they can send the request operation to the new main library.

2.2 Sentry Mode

Because the Redis sentinel is also a Redis process, if it hangs up by itself, it will not be able to monitor it. Let's take a look at the Redis sentinel mode

Sentry mode is a sentinel system composed of one or more sentinel instances. It can monitor all Redis master nodes and slave nodes, and when the monitored master node goes offline, it will automatically offline the master server's A slave node is promoted to be the new master node. , a sentinel process monitors the Redis node, there may be problems (single point problem). Therefore, multiple sentinels are generally used to monitor Redis nodes, and monitoring will be performed between each sentinel.

insert image description here

In fact, Sentinels form a cluster through a publish-subscribe mechanism. At the same time, Sentinels obtain connection information from the slave library through the INFO command, and can also establish a connection with the slave library for monitoring.

2.3 How does Sentinel determine that the main database is offline

How does Sentinel determine whether the main library is offline? Let's first understand two basic concepts: subjective offline and objective offline.
The sentry process sends a PING command to the main library and the slave library. If the main library or the slave library does not respond to the PING command within the specified time, the sentry will mark it as subjectively offline.
If the main library is marked as subjectively offline, all the sentries who are monitoring the main library will check once per second whether the main library has really entered the subjective offline. When a majority of sentinels (generally, a minority obeys the majority, a value set by the Redis administrator) confirm that the main library has indeed entered the subjective offline state within the specified time range, the main library will be marked as objective offline. The purpose of this is to avoid misjudgment of the main library, to reduce unnecessary master-slave switching and unnecessary overhead.
Suppose we have N sentinel instances, if there are N/2+1 instances that determine that the main database is offline subjectively, then the node can be marked as objectively offline, and the master-slave switch can be performed.

2.4 Working Mode of Sentry

Each Sentinel sends a PING once per second to the master, slave, and other Sentinel instances it knows about.
If the time from the last valid reply to the PING command of an instance node exceeds the value specified by the down-after-milliseconds option, the instance will be marked as subjectively offline by the sentinel.

If the main library is marked as subjectively offline, all sentinels that are monitoring the main library must confirm once per second that the main library has indeed entered the subjective offline state.
When a sufficient number of sentries (greater than or equal to the value specified in the configuration file) confirm that the main library has indeed entered the subjective offline state within the specified time range, the main library will be marked as objectively offline.
When the main library is marked as objectively offline by the sentinel, it will enter the main selection mode.
If there are not enough sentinels agreeing that the main library has entered the subjective offline status, the subjective offline status of the main library will be removed; if the main library returns a valid reply to the PING command of the sentry, the subjective offline status of the main library will be removed was removed.

2.5 How does the sentinel choose the master?

If it is clear that the main library has objectively gone offline, Sentinel will start the main selection mode.
Sentinel selection includes two major processes, namely: filtering and scoring. In fact, among multiple slave libraries, first filter out the unqualified slave libraries according to certain filtering conditions. Then, according to certain rules, score the remaining slave libraries one by one, and select the slave library with the highest score as the new master library

insert image description here

When selecting the master, it will judge the status of the slave library. If it is offline, it will be filtered directly.
If the slave library network is not good and always times out, it will also be filtered out. Look at this parameter down-after-milliseconds, which indicates the maximum connection timeout time we believe the master-slave library is disconnected.
After filtering out the slave libraries that are not suitable as the master library, you can score the remaining slave libraries according to these three rules: slave library priority, slave library copy progress, and slave library ID number.
If the slave library has the highest priority, the score will be higher, and the priority can be configured through slave-priority. If the priority is the same, choose the slave library with the fastest replication progress with the old master library. If the priority is the same as the progress of the slave library, the slave library with a smaller ID number will be scored higher.

2.6 Which sentinel performs master-slave switching?

After a sentinel marks the main library as subjective offline, it will seek the opinions of other sentries to confirm whether the main library has indeed entered the subjective offline state. It sends is-master-down-by-addr commands to other instance sentinels. Other sentries will respond Y or N according to their connection with the main library (Y means yes, N means no). If the sentinel gets enough votes (quorum configuration), the main library will be marked as objectively offline.

The sentinel that marks the main library objectively offline, then sends commands to other sentries, and then initiates a vote, hoping that it can perform the master-slave switch. This voting process is called Leader election. Because the sentinel that finally performs the master-slave switch is called the Leader, the voting process is to determine the Leader. A Sentinel needs to meet two conditions to become a Leader:

Need to get num(sentinels)/2+1 votes in favor.
And the number of votes obtained needs to be greater than or equal to the quorum value in the sentinel configuration file.
For example, suppose there are 3 sentries. The configured quorum value is 2. That is, each Sentinel needs at least 2 tickets to become a Leader. For a better understanding, you can look at

insert image description here

At time t1, sentinel A1 judges that the master database is objectively offline, and it wants to become the leader of the master-slave switch, so it votes for itself first, and then issues voting commands to sentinels A2 and A3, expressing that it wants to become the leader.
At time t2, A3 judges that the main database is objectively offline, and it also wants to become a leader, so it votes for itself first, and then issues voting commands to A1 and A2, expressing that it also wants to become a leader.
At time t3, sentinel A1 receives the Leader voting request from A3. Because A1 has voted Y for itself, it can no longer vote for other sentries, so A1 votes N for A3.
At time t4, Sentinel A2 receives the Leader voting request from A3, because Sentinel A2 has not voted before, it will reply yes vote Y to the first Sentinel that sent it a voting request.
At time t5, Sentinel A2 receives the Leader voting request from A1, because Sentinel A2 has already voted for A3 before, so it can only vote N against A1.
At the last moment of t6, Sentinel A1 only received one Y vote, while Sentinel A3 got two Y votes (voted by A2 and A3), so Sentinel A3 became the Leader.
Assuming that the sentinel A3 has not received two votes for reasons such as network failure, then this round of voting will not produce a Leader. The sentinel cluster will wait for a period of time (usually twice the timeout period of sentinel failover) before re-election.

2.7 Failover

Assuming the sentinel mode architecture is as follows, there are three sentinels, one master library M, and two slave libraries S1 and S2.
insert image description here
When Sentinel detects that the Redis main library M1 fails, Sentinel needs to failover the cluster. Assume that Sentinel 3 is elected as the Leader. The failover process is as follows:
insert image description here
Slave library S1 removes the identity of the slave node and upgrades to the new master library.
Slave library S2 becomes the slave library of the new master library. The original
master node restores and becomes the slave node of the new master library.
Notifies the client application of the address of the new master node .
After failover:

insert image description here

3. Redis Cluster cluster

The sentinel mode is based on the master-slave mode, which realizes the separation of reading and writing. It can also switch automatically, and the system availability is higher. However, the data stored in each node is the same, which wastes memory and is not easy to expand online. Therefore, the Reids Cluster cluster (the implementation scheme of the slice cluster) came into being. It was added in Redis3.0 and realized the distributed storage of Redis. Shard the data, that is to say, store different content on each Redis node to solve the problem of online expansion. Moreover, it can save a large amount of data, that is, disperse the data to each Redis instance, and also provide replication and failover functions.

For example, if you save 15G or more data in a Redis instance, the response will be very slow. This is caused by the Redis RDB persistence mechanism. Redis will fork the child process to complete the RDB persistence operation. The time-consuming execution of fork and the amount of Redis data into a positive correlation.
At this time, it is easy for you to think that it is enough to disperse and store 15G data. This is the original intention of Redis slice cluster. What is a slice cluster? Let’s take an example. If you want to use Redis to store 15G of data, you can use a single instance of Redis or three Redis instances to form a slice cluster. The comparison is as follows:

The difference between sliced ​​clusters and Redis Cluster: Redis Cluster is an official solution for implementing sliced ​​clusters starting from Redis 3.0.
insert image description here
Since the data is fragmented and distributed to different Redis instances, how does the client determine which instance the data it wants to access is on? Let's take a look at how Reids Cluster does it.

3.1 Hash slot (Hash Slot)

The Redis Cluster solution uses hash slots (Hash Slot) to handle the mapping relationship between data and instances.

A slice cluster is divided into 16384 slots (slots), and each key-value pair entering Redis is hashed according to the key and assigned to one of the 16384 slots. The hash map used is also relatively simple. Use the CRC16 algorithm to calculate a 16-bit value, and then take the modulus of 16384. Each key in the database belongs to one of these 16384 slots, and each node in the cluster can handle these 16384 slots.

Each node in the cluster is responsible for a part of the hash slots. Assuming that the current cluster has 3 nodes A, B, and C, and the number of hash slots on each node is 16384/3, then there may be a distribution:

Node A is responsible for hash slots 0~5460
Node B is responsible for hash slots 5461~10922 Node
C is responsible for hash slots 10923~16383
When the client sends data read and write operations to a Redis instance, if there is no corresponding data, what happens? Learn about MOVED redirection and ASK redirection

3.2 MOVED redirection and ASK redirection

In Redis cluster mode, nodes process requests as follows:

Through the hash slot mapping, check whether the current Redis key exists on the current node.
If the hash slot is not in charge of its own node, return MOVED redirection.
If the hash slot is indeed in charge of itself, and the key is in the slot, return the corresponding result of the key.
If the Redis key does not exist in this hash slot, check whether the hash slot is migrating out (MIGRATING)?
If the Redis key is being migrated out, return an ASK error to redirect the client to the migration destination server.
If the hash slot is not being migrated out, check whether the hash slot is being imported?
If the hash slot is being imported and there is an ASKING tag, then operate directly, otherwise return MOVED redirection

3.2.1 Moved Redirection

When the client sends data read and write operations to a Redis instance, if the calculated slot is not on the node, it will return a MOVED redirection error. In the MOVED redirection error, the new instance where the hash slot is located The IP and port port are brought back. This is the MOVED redirection mechanism of Redis Cluster. The flow chart is as follows:

insert image description here

3.2.2 ASK Redirection

Ask redirection generally occurs when the cluster scales. Cluster scaling will lead to slot migration. When we visit the source node, the data may have already been migrated to the target node. Using Ask redirection can solve this situation.

insert image description here

3.3 Communication protocol of Cluster cluster nodes: Gossip

A Redis cluster consists of multiple nodes. How do the nodes communicate with each other? Pass the Gossip protocol! Gossip is a rumor dissemination protocol. Each node periodically selects k nodes from the node list, and spreads the information stored by the node until the information of all nodes is consistent, that is, the algorithm converges.

The basic idea of ​​the Gossip protocol: a node wants to share some information with other nodes in the network. Therefore, it randomly selects some nodes periodically and transmits information to these nodes. These nodes that receive the information will then do the same thing, that is, pass the information to some other randomly selected nodes. In general, information is periodically delivered to N target nodes, not just one. This N is called fanout.
The Redis Cluster communicates through the Gossip protocol. The nodes exchange information continuously before. The information exchanged includes node failure, new node joining, master-slave node change information, slot information, and so on. The gossip protocol contains a variety of message types, including ping, pong, meet, fail, etc.

insert image description here

meet message: notify new nodes to join. The message sender notifies the receiver to join the current cluster. After the meet message communication is completed normally, the receiving node will join the cluster and exchange ping and pong messages periodically.
Ping message: The node will send a ping message to other nodes in the cluster every second. The message contains the addresses, slots, status information, and last communication time of the two nodes known by itself. Pong message: When receiving a ping or meet
message , as a response message, reply to the sender to confirm the normal communication of the message. The message also carries two known node information.
Fail message: When a node determines that another node in the cluster is offline, it will broadcast a fail message to the cluster, and other nodes will update the corresponding node to the offline state after receiving the fail message.

In particular, each node communicates with other nodes through a cluster bus. When communicating, use a special port number, that is, add 10000 to the external service port number. For example, if the port number of a node is 6379, then the port number it communicates with other nodes is 16379. Communication between nodes uses a special binary protocol.

3.4 Failover

The Redis cluster achieves high availability. When a node in the cluster fails, failover is used to ensure that the cluster can provide services to the outside world.
The redis cluster implements fault discovery through ping/pong messages. This environment includes subjective offline and objective offline.
Subjective offline: A node believes that another node is unavailable, that is, the offline state. This state is not the final fault judgment, but only represents the opinion of one node, and there may be misjudgment.

insert image description here

Objective offline: Refers to the fact that a node is truly offline, and multiple nodes in the cluster believe that the node is unavailable, thereby reaching a consensus result. If the primary node holding the slot fails, failover for that node is required.

If node A marks node B as subjectively offline, after a period of time, node A sends the status of node B to other nodes through a message. When node C receives the message and parses the message body, if it finds the pfail status of node B , will trigger the objective offline process;
when the offline master node is active, the Redis Cluster cluster votes for the master node holding the slot to see if the number of votes reaches half, and when the offline report statistics are more than half, it is marked as objective Offline status.

The process is as follows:
insert image description here

Fault recovery: After a fault is discovered, if the offline node is the master node, one of its slave nodes needs to be selected to replace it to ensure high availability of the cluster. The process is as follows:
insert image description here
Qualification check: Check whether the slave node has the conditions to replace the failed master node.
Prepare election time: After the eligibility check is passed, update the trigger failure election time.
Initiate an election: When the failure election time comes, an election will be carried out.
Election voting: Only the master node holding the slot has votes, and the slave node collects enough votes (more than half), triggering the operation of replacing the master node

3.5 Extra meal: Why is the Hash Slot of Redis Cluster 16384?

For the key value key requested by the client, the hash slot = CRC16(key) % 16384, and the hash value generated by the CRC16 algorithm is 16bit. Logically, the algorithm can generate 2 16 = 65536 values, why not use 65536 , What about 16384 (2 14)?

You can read the author's original answer:
insert image description here
Redis saves the corresponding slots on each instance node, which is an unsigned char slots[REDIS_CLUSTER_SLOTS/8] type

insert image description here

When the redis node sends a heartbeat packet, all the slots need to be put into the heartbeat packet. If the number of slots is 65536, the occupied space = 65536 / 8 (one byte 8bit) / 1024 (1024 bytes 1kB) = 8kB, if The number of slots used is 16384, and the occupied space = 16384 / 8 (8bit per byte) / 1024 (1kB for 1024 bytes) = 2kB. It can be seen that 16384 slots save about 6kB of memory compared with 65536. If a cluster has 100 nodes , then each instance saves 600kB.
Under normal circumstances, the number of master nodes in a Redis cluster cluster is basically impossible to exceed 1,000. Exceeding 1,000 will cause network congestion. For Redis clusters with less than 1000 nodes, 16384 slots are actually enough.
Since in order to save memory network overhead, why the slots do not choose to use 8192 (ie 16384/2)?

8192 / 8 (8 bits per byte) / 1024 (1024 bytes 1kB) = 1kB, only 1KB is required! You can first look at how Redis converts the Key into the slots it belongs to.

unsigned int keyHashSlot(char key, int keylen) {
int s, e; /
start-end indexes of { and } */

for (s = 0; s < keylen; s++)
    if (key[s] == '{') break;

/* No '{' ? Hash the whole key. This is the base case. */
if (s == keylen) return crc16(key,keylen) & 0x3FFF;

/* '{' found? Check if we have the corresponding '}'. */
for (e = s+1; e < keylen; e++)
    if (key[e] == '}') break;

/* No '}' or nothing betweeen {} ? Hash the whole key. */
if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;

/* If we are here there is both a { and a } on its right. Hash
 * what is in the middle between { and }. */
return crc16(key+s+1,e-s-1) & 0x3FFF;

}

The way Redis converts keys into slots: in fact, it is to calculate with the number of slots after crc16(key)

Why use 0x3FFF (16383) to calculate here instead of 16384? Because x % (2^n) is equivalent to x & (2^n - 1) without overflow, that is, x % 16384 == x & 16383

So why not use 8192?
When crc16 comes out, the probability of repetition is theoretically 1⁄65536, but the actual result may be much larger than this, just like the result of crc32 is theoretically 1/4 billion, but actually someone measured 100,000 collisions The probability is relatively high. If the slots are set to 8192, in the case of 200 instance nodes, the theoretical value is that every 40 different key requests, the hit will fail once. If the number of nodes increases to 400, that is 20 requests. And 1kb will not save much compared to 2k, and the cost performance is not particularly high, so it may be more general to choose 16384

Guess you like

Origin blog.csdn.net/chuige2013/article/details/129020690