In-depth study of Redis: master-slave replication

foreword

In the previous two articles, the memory model of Redis and the persistence of Redis were introduced respectively.

As mentioned in the persistence of Redis, Redis high-availability solutions include persistence, master-slave replication (and read-write separation), sentinels and clusters. Among them, persistence focuses on the stand-alone backup of Redis data (backup from memory to hard disk); while master-slave replication focuses on multi-machine hot backup of data. In addition, master-slave replication can also achieve load balancing and fault recovery.

In this article, I will introduce all aspects of Redis master-slave replication in detail, including: how to use master-slave replication, the principle of master-slave replication (focusing on full replication and partial replication, and the heartbeat mechanism), problems that need attention in practical applications ( Such as data inconsistency, replication timeout, replication buffer overflow), configuration related to master-slave replication (focus on repl-timeout, client-output-buffer-limit slave), etc.

1. Overview of master-slave replication

Master-slave replication refers to copying the data of one Redis server to other Redis servers. The former is called the master node (master), and the latter is called the slave node (slave); data replication is one-way, only from the master node to the slave node.

By default, each Redis server is a master node; and a master node can have multiple slave nodes (or no slave nodes), but a slave node can only have one master node.

The role of master-slave replication

The functions of master-slave replication mainly include:

  1. Data redundancy: master-slave replication implements hot backup of data, which is a data redundancy method other than persistence.
  2. Fault recovery: When there is a problem with the master node, the slave node can provide services to achieve rapid fault recovery; it is actually a kind of service redundancy.
  3. Load balancing: On the basis of master-slave replication, combined with read-write separation, the master node can provide write services, and the slave nodes can provide read services (that is, the application connects to the master node when writing Redis data, and the application connects to the slave node when reading Redis data) , to share the server load; especially in the scenario of writing less and reading more, sharing the read load through multiple slave nodes can greatly increase the concurrency of the Redis server.
  4. The cornerstone of high availability: In addition to the above functions, master-slave replication is also the basis for the implementation of sentinels and clusters, so master-slave replication is the basis for high availability of Redis.

2. How to use master-slave replication

In order to understand master-slave replication more intuitively, before introducing its internal principles, first explain how we need to operate to enable master-slave replication.

1. Create a copy

It should be noted that the activation of master-slave replication is completely initiated on the slave node; we do not need to do anything on the master node.

There are three ways to enable master-slave replication on slave nodes:

(1) configuration file

Add in the configuration file of the slave server: slaveof <masterip> <masterport>

(2) Start command

Add --slaveof <masterip> <masterport> after redis-server startup command

(3) Client command

After the Redis server is started, execute the command directly through the client: slaveof <masterip> <masterport>, then the Redis instance becomes a slave node.

The above three methods are equivalent. Let’s take the client command method as an example to see the changes of the Redis master node and slave node after slaveof is executed.

2. Examples

Preparations: start two nodes

For convenience, the master and slave nodes used in the experiment are different Redis instances on one machine, where the master node listens to port 6379, and the slave node listens to port 6380; the port number that the slave node listens to can be modified in the configuration file:

After startup you can see:

After the two Redis nodes are started (referred to as 6379 nodes and 6380 nodes respectively), they are both master nodes by default.

create copy

At this time, execute the slaveof command on the 6380 node to make it a slave node:

Observe the effect

Let's verify that after the master-slave replication is established, the data of the master node will be copied to the slave node.

(1) First query a non-existing key from the slave node:

(2) Then add this key in the master node:

(3) At this time, query the key again in the slave node, and you will find that the operation of the master node has been synchronized to the slave node:

(4) Then delete the key on the master node:

(5) At this time, query the key again in the slave node, and you will find that the operation of the master node has been synchronized to the slave node:

3. Disconnect replication

After the master-slave replication relationship is established through the slaveof <masterip> <masterport> command, it can be disconnected through slaveof no one. It should be noted that after the slave node disconnects the replication, the existing data will not be deleted, but the new data changes of the master node will no longer be accepted.

After the slave node executes slaveof no one, the print log is as follows; it can be seen that after the replication is disconnected, the slave node becomes the master node again.

The master node prints the log as follows:

3. Implementation principle of master-slave replication

In the above section, we introduced how to operate to establish a master-slave relationship; this section will introduce the implementation principle of master-slave replication.

The master-slave replication process can be roughly divided into three stages: the connection establishment stage (that is, the preparation stage), the data synchronization stage, and the command propagation stage; the following will introduce them respectively.

1. Connection establishment phase

The main function of this stage is to establish a connection between the master and slave nodes to prepare for data synchronization.

Step 1: Save the master node information

Two fields are maintained inside the slave node server, namely the masterhost and masterport fields, which are used to store the ip and port information of the master node.

It should be noted that slaveof is an asynchronous command. After the slave node finishes saving the ip and port of the master node , it returns OK directly to the client that sent the slaveof command , and the actual copy operation starts after that.

During this process, you can see that the slave node prints logs as follows:

Step 2: Establish a socket connection

The slave node calls the replication timing function replicationCron() once per second. If it finds that there is a master node that can be connected, it will create a socket connection according to the ip and port of the master node. If the connection is successful, then:

Slave node: Create a file event handler for the socket that handles copying work, and is responsible for subsequent copying work, such as receiving RDB files, receiving command propagation, and so on.

Master node: After receiving the socket connection from the slave node (that is, after accept), create a corresponding client state for the socket, and regard the slave node as a client connected to the master node, and the following steps will be based on the slave node It is done by sending a command request to the master node.

During this process, the slave node prints logs as follows:

Step 3: Send a ping command

After the slave node becomes the client of the master node, it sends a ping command for the first request to check whether the socket connection is available and whether the master node is currently able to process the request.

After the slave node sends the ping command, three situations may occur:

(1) Return pong: indicating that the socket connection is normal, and the master node can currently process the request, and the replication process continues.

(2) Timeout: After a certain period of time, the slave node has not received the reply from the master node, indicating that the socket connection is unavailable, and the slave node disconnects the socket connection and reconnects.

(3) Return results other than pong: If the master node returns other results, such as processing a script that runs overtime, indicating that the master node is currently unable to process the command, the slave node disconnects the socket connection and reconnects.

When the master node returns pong, the slave node prints the log as follows:

Step 4: Authentication

If the masterauth option is set in the slave node, the slave node needs to authenticate to the master node; if this option is not set, no authentication is required. The authentication of the slave node is performed by sending the auth command to the master node, and the parameter of the auth command is the value of masterauth in the configuration file.

If the state of the password set by the master node is consistent with the state of the masterauth of the slave node (consistent means that both exist, and the password is the same, or neither exists), the authentication passes and the replication process continues; if they are inconsistent, the slave node disconnects the socket Connect and reconnect.

Step 5: Send slave node port information

After authentication, the slave node will send the port number it listens to (6380 in the preceding example) to the master node, and the master node will save this information in the slave_listening_port field of the client corresponding to the slave node; the port information is not included in the master node It has no effect other than displaying when info Replication is executed.

2. Data synchronization phase

After the connection between the master and slave nodes is established, data synchronization can begin, which can be understood as the initialization of slave node data. The specific execution method is: the slave node sends the psync command to the master node (Redis2.8 was the sync command before), and the synchronization starts.

The data synchronization stage is the core stage of master-slave replication. According to the current state of the master-slave node, it can be divided into full replication and partial replication. The following chapter will specifically explain these two replication methods and the execution process of the psync command. Here No more details.

It should be noted that before the data synchronization stage, the slave node is the client of the master node, and the master node is not the client of the slave node; and at this stage and later, the master and slave nodes are clients of each other. The reason is: before this, the master node only needs to respond to the request from the slave node, and does not need to actively send a request, but in the data synchronization phase and the subsequent command propagation phase, the master node needs to actively send requests to the slave node (such as push buffer write command in the area), to complete the copy.

3. Command Propagation Phase

After the data synchronization phase is completed, the master-slave node enters the command propagation phase; at this stage, the master node sends the write command it executes to the slave node, and the slave node receives and executes the command, thereby ensuring the data consistency of the master-slave node.

In the command propagation phase, in addition to sending write commands, the master and slave nodes also maintain a heartbeat mechanism: PING and REPLCONF ACK. Since the principle of the heartbeat mechanism involves partial replication, the heartbeat mechanism will be introduced separately after introducing the relevant content of partial replication.

Latency and inconsistency

It should be noted that command propagation is an asynchronous process, that is, the master node does not wait for the reply from the slave node after sending the write command; therefore, it is actually difficult to maintain real-time consistency between the master and slave nodes, and delay is inevitable. The degree of data inconsistency is related to the network status between the master and slave nodes, the execution frequency of the write command of the master node, and the repl-disable-tcp-nodelay configuration in the master node.

repl-disable-tcp-nodelay no: This configuration is used in the command propagation phase to control whether the master node prohibits TCP_NODELAY with the slave node; the default is no, that is, TCP_NODELAY is not prohibited. When set to yes, TCP will merge packets to reduce the bandwidth, but the sending frequency will be reduced, the slave node data delay will increase, and the consistency will become poor; the specific sending frequency is related to the configuration of the Linux kernel, and the default configuration is 40ms. When set to no, TCP will immediately send the data of the master node to the slave node, the bandwidth increases but the delay becomes smaller.

Generally speaking, only when the application has a high tolerance for Redis data inconsistency and the network condition between the master and slave nodes is not good, it will be set to yes; in most cases, the default value of no is used.

4. [Data Synchronization Phase] Full Copy and Partial Copy

Before Redis2.8, the slave node sent the sync command to the master node to request data synchronization, and the synchronization method at this time was full copy; in Redis2.8 and later, the slave node can send the psync command to request data synchronization, at this time according to the master-slave node Depending on the current status, the synchronization method may be full replication or partial replication. The following introductions take Redis 2.8 and later versions as examples.

  1. Full replication: used for initial replication or other situations where partial replication is not possible. Sending all data in the master node to the slave node is a very heavy operation.
  2. Partial replication: It is used for replication after network interruption and other situations. Only the write commands executed by the master node during the interruption are sent to the slave nodes, which is more efficient than full replication. It should be noted that if the network interruption time is too long, resulting in the master node not being able to fully save the write commands executed during the interruption, partial replication cannot be performed, and full replication is still used.

1. Full copy

The process of Redis full replication through the psync command is as follows:

(1) The slave node judges that partial replication cannot be performed, and sends a full replication request to the master node; or the slave node sends a partial replication request, but the master node judges that partial replication cannot be performed; the specific judgment process needs to be described after describing the principle of partial replication introduce.

(2) After the master node receives the full copy command, it executes bgsave, generates the RDB file in the background, and uses a buffer (called copy buffer) to record all write commands executed from now on

(3) After the bgsave execution of the master node is completed, send the RDB file to the slave node; the slave node first clears its old data, then loads the received RDB file , and updates the database status to the database status when the master node executes bgsave

(4) The master node sends all the write commands in the aforementioned copy buffer to the slave node, and the slave node executes these write commands to update the database state to the latest state of the master node

(5) If AOF is enabled on the slave node, the execution of bgrewriteaof will be triggered to ensure that the AOF file is updated to the latest state of the master node

The following is the log printed by the master and slave nodes when performing full replication; it can be seen that the contents of the log exactly correspond to the above steps.

The print log of the master node is as follows:

The log from the node is printed as shown in the figure below:

Among them, there are a few points to note: the slave node has received 89260 bytes of data from the master node; the slave node must clear the old data before loading the data from the master node; after the slave node has synchronized the data, it calls bgrewriteaof.

Through the process of full copy, we can see that full copy is a very heavy operation:

(1) The master node uses the bgsave command to fork the child process to perform RDB persistence. This process consumes a lot of CPU, memory (page table copy), and hard disk IO; for the performance of bgsave, you can refer to the in-depth study of Redis:  Persistence

(2) The master node sends the RDB file to the slave node through the network, which will consume a lot of bandwidth of the master and slave nodes

(3) The process of clearing old data and loading new RDB files from the slave node is blocked and cannot respond to client commands; if the slave node executes bgrewriteaof, it will also bring additional consumption

2. Partial copy

Since full replication is too inefficient when the master node has a large amount of data, Redis 2.8 began to provide partial replication to handle data synchronization when the network is interrupted.

The realization of partial replication relies on three important concepts:

(1) copy offset

The master node and the slave node respectively maintain a replication offset (offset), which represents the number of bytes passed from the master node to the slave node ; each time the master node transmits N bytes of data to the slave node, the offset of the master node increases N; every time the slave node receives N bytes of data from the master node, the offset of the slave node increases by N.

The offset is used to judge whether the database status of the master and slave nodes is consistent: if the offsets of the two are the same, they are consistent; if the offsets are different, they are inconsistent. At this time, the data missing from the slave node can be found according to the two offsets. For example, if the offset of the master node is 1000, and the offset of the slave node is 500, then the partial replication needs to transfer the data with offset 501-1000 to the slave node. The location where the data with offset 501-1000 is stored is the replication backlog buffer to be introduced below.

(2) Copy the backlog buffer

The replication backlog buffer is a fixed-length, first-in-first-out (FIFO) queue maintained by the master node, with a default size of 1MB; it is created when the master node starts to have slave nodes, and its function is to back up the data recently sent by the master node to the slave nodes. Note that only one replication backlog buffer is required whether the master has one or more slaves.

In the command propagation phase, in addition to sending the write command to the slave node, the master node will also send a copy to the replication backlog buffer as a backup of the write command; in addition to storing the write command, each of them is also stored in the replication backlog buffer The byte corresponds to the copy offset (offset). Since the replication backlog buffer has a fixed length and is first-in-first-out, it stores the most recent write commands executed by the primary node; older write commands will be squeezed out of the buffer.

Since the length of the buffer is fixed and limited, the write commands that can be backed up are also limited. When the gap between the master-slave node offset is too large to exceed the length of the buffer, partial replication cannot be performed, and only full replication can be performed. Conversely, in order to increase the probability of partial replication execution when the network is interrupted, the size of the replication backlog buffer can be increased as needed (by configuring repl-backlog-size); for example, if the average time of network interruption is 60s, and the average The number of bytes occupied by the write command (specific protocol format) generated per second is 100KB, and the average demand for the copy backlog buffer is 6MB. To be on the safe side, it can be set to 12MB to ensure that most disconnection situations can be used partial copy.

After the slave node sends the offset to the master node, the master node decides whether to perform partial replication according to the offset and buffer size:

  • If the data after the offset offset is still in the copy backlog buffer, perform partial copy;
  • If the data after the offset offset is no longer in the copy backlog buffer (the data has been squeezed out), perform a full copy.

(3) Server running ID (runid)

Each Redis node (regardless of master and slave) will automatically generate a random ID (different for each startup) at startup, consisting of 40 random hexadecimal characters; runid is used to uniquely identify a Redis node. Through the info Server command, you can view the runid of the node:

When the master-slave node replicates for the first time, the master node sends its own runid to the slave node, and the slave node saves the runid; when disconnected and reconnected, the slave node will send the runid to the master node; No partial copy:

  • If the runid saved by the slave node is the same as the current runid of the master node, it means that the master-slave node has been synchronized before, and the master node will continue to try to use partial replication (whether it can be partially replicated depends on the offset and the replication backlog buffer);
  • If the runid saved by the slave node is different from the current runid of the master node, it means that the Redis node synchronized by the slave node before disconnection is not the current master node, and only full copy can be performed.

3. Execution of the psync command

After understanding the replication offset, replication backlog buffer, and node running id, this section will introduce the parameters and return values ​​of the psync command, so as to explain how the master-slave node determines whether to use full replication or partial replication during the execution of the psync command of.

The execution process of the psync command can be seen in the figure below (picture source: "Redis Design and Implementation"):

(1) First, the slave node decides how to call the psync command according to the current state:

  • If the slave node has not executed slaveof before or executed slaveof no one recently, the slave node sends the command psync ? -1 to request full replication from the master node;
  • If the slave node has executed slaveof before, the command to send is psync <runid> <offset>, where runid is the runid of the master node replicated last time, and offset is the replication offset saved by the slave node when the last replication expired.

(2) The master node decides to perform full or partial replication based on the received psync command and the current server status:

  • If the master node version is lower than Redis2.8, it will return -ERR reply, at this time, the slave node resends the sync command to perform full replication;
  • If the master node version is new enough, and the runid is the same as the runid sent by the slave node, and the data after the offset sent by the slave node exists in the replication backlog buffer, then reply +CONTINUE, indicating that partial replication will be performed, and the slave node waits for the master The node sends the missing data;
  • If the master node version is new enough, but the runid is different from the runid sent by the slave node, or the data after the offset sent by the slave node is no longer in the replication backlog buffer (squeezed out in the queue), then reply +FULLRESYNC <runid> <offset> means to perform full copy, where runid indicates the current runid of the master node, offset indicates the current offset of the master node, and the slave node saves these two values ​​for future use.

4. Partially reproduce the demo

In the demo below, a disconnected master-slave node is partially replicated after a network outage for a few minutes; in order to simulate a network outage, the master-slave node in this example is on two machines in the local area network.

Network interruption

After the network is interrupted for a period of time, both the master node and the slave node will find that they have lost the connection with each other (the judgment mechanism of the master-slave node for timeout will be explained later); after that, the slave node will start to reconnect to the master node, Since the network has not recovered at this time, the reconnection fails, and the slave node will always try to reconnect.

The main node log is as follows:

The slave node logs are as follows:

network recovery

After the network is restored, the slave node successfully connects to the master node and requests partial replication. After the master node receives the request, the two perform partial replication to synchronize data.

The main node log is as follows:

The slave node logs are as follows:

5. [Command Propagation Phase] Heartbeat Mechanism

In the command propagation phase, in addition to sending write commands, the master and slave nodes also maintain a heartbeat mechanism: PING and REPLCONF ACK. The heartbeat mechanism is useful for timeout judgment and data security of master-slave replication.

1. Master->Slave: PING

Every specified time, the master node will send a PING command to the slave node . The function of this PING command is mainly to allow the slave node to make a timeout judgment.

The frequency of PING sending is controlled by the repl-ping-slave-period parameter, in seconds, and the default value is 10s.

There is some controversy about whether the PING command is sent from the master node to the slave node, or vice versa; because in the official Redis documentation, the comment to the parameter states that the slave node sends the PING command to the master node, as shown in the figure below :

But according to the name of the parameter (including ping-slave) and code implementation, I think the PING command is sent from the master node to the slave node. The relevant code is as follows:

2. Slave -> Master: REPLCONF ACK

In the command propagation phase, the slave node will send the REPLCONF ACK command to the master node at a frequency of once per second; the command format is: REPLCONF ACK {offset}, where offset refers to the replication offset saved by the slave node. The functions of the REPLCONF ACK command include:

(1) Real-time monitoring of the master-slave node network status: This command will be used by the master node to judge the replication timeout. In addition, using info Replication in the master node, you can see the lag value in the status of the slave node, which represents the time interval when the master node last received the REPLCONF ACK command. Under normal circumstances, this value should be 0 or 1, as shown below:

(2) Detection command loss: The slave node sends its own offset, and the master node will compare it with its own offset. If the slave node data is missing (such as network packet loss), the master node will push the missing data (here also use the replication backlog buffer). Note that offset and copy backlog buffer can not only be used for partial copy, but also can be used to deal with situations such as command loss; the difference is that the former is performed after disconnection and reconnection, while the latter is performed when the master and slave nodes are not disconnected under the circumstances.

(3) Auxiliary guarantees the number and delay of slave nodes: the Redis master node uses min-slaves-to-write and min-slaves-max-lag parameters to ensure that the master node will not execute write commands under unsafe conditions; The so-called unsafe means that the number of slave nodes is too small, or the delay is too high. For example, min-slaves-to-write and min-slaves-max-lag are 3 and 10 respectively, which means that if the number of slave nodes is less than 3, or the delay value of all slave nodes is greater than 10s, the master node refuses to execute the write command . The acquisition of the delay value of the slave node here is judged by the time when the master node receives the REPLCONF ACK command, that is, the lag value in the info Replication mentioned earlier.

6. Problems in application

1. Read-write separation and its problems

The read-write separation based on master-slave replication can realize Redis read load balancing: the master node provides write services, and one or more slave nodes provide read services (multiple slave nodes can improve data redundancy, It can also maximize the read load capacity); in the application scenario with a large read load, the concurrency of the Redis server can be greatly increased. The following introduces the issues that need to be paid attention to when using Redis read-write separation.

(1) Delay and inconsistency issues

As mentioned above, since the command propagation of master-slave replication is asynchronous, the inconsistency between delay and data is inevitable. If the application's acceptance of data inconsistency is low, possible optimization measures include: optimizing the network environment between the master and slave nodes (such as deploying in the same computer room); monitoring the delay of the master and slave nodes (by offset) to judge, if the delay If the value is too large, notify the application to no longer read data through the slave node; use the cluster to expand the write load and read load at the same time.

The data inconsistency of the slave node may be more serious in other situations other than the command propagation phase, such as when the connection is in the data synchronization phase, or when the slave node loses the connection with the master node, etc. The slave-serve-stale-data parameter of the slave node is related to this: it controls the performance of the slave node in this case; if it is yes (the default value), the slave node can still respond to the client's command, if it is no, then The slave node can only respond to a few commands such as info and slaveof. The setting of this parameter is related to the data consistency requirement of the application; if the data consistency requirement is very high, it should be set to no.

(2) Data expiration problem

In the stand-alone version of Redis, there are two deletion strategies:

  • Lazy deletion: The server will not actively delete data, only when the client queries a certain data, the server judges whether the data is expired, and deletes it if it is expired.
  • Periodic deletion: The server executes scheduled tasks to delete expired data, but considering the compromise between memory and CPU (deletion will release memory, but frequent deletion operations are not friendly to CPU), the frequency and execution time of this deletion are limited.

In the master-slave replication scenario, for the data consistency of the master and slave nodes, the slave nodes will not actively delete data, but the master node controls the deletion of expired data in the slave nodes. Due to the lazy deletion and regular deletion strategies of the master node, neither the master node can guarantee that the master node will delete the expired data in a timely manner. Therefore, when the client reads data from the node through Redis, it is easy to read the expired data.

In Redis 3.2, when the slave node reads data, it adds a judgment on whether the data is expired: if the data has expired, it will not be returned to the client; upgrading Redis to 3.2 can solve the problem of data expiration.

(3) Failover problem

In the read-write separation scenario without Sentinel, the application connects to different Redis nodes for reading and writing; when the master node or the slave node has a problem and changes, it is necessary to modify the connection of the application to read and write Redis data in time; connection switching It can be switched manually, or by writing a monitoring program yourself, but the former responds slowly and is prone to errors, while the latter is complicated to implement and the cost is not low.

(4) Summary

Before using read-write separation, you can consider other methods to increase the read load capacity of Redis: such as optimizing the master node as much as possible (reduce slow queries, reduce blocking caused by other situations such as persistence, etc.) to improve load capacity; use Redis cluster to increase read load capacity at the same time Load capacity and write load capacity, etc. If you use read-write separation, you can use sentinels to make the failover of the master and slave nodes as automatic as possible and reduce the intrusion to the application.

2. Replication timeout problem

Master-slave node replication timeout is one of the most important reasons for replication interruption. This section explains the timeout problem separately, and the next section explains other problems that can cause replication interruption.

Timeout Judgment Meaning

During and after the replication connection is established, the master and slave nodes have a mechanism to determine whether the connection has timed out, which means:

(1) If the master node judges that the connection has timed out, it will release the connection of the corresponding slave node, thereby releasing various resources, otherwise the invalid slave node will still occupy various resources of the master node (output buffer, bandwidth, connection, etc.); In addition, the judgment of connection timeout can allow the master node to know the number of current valid slave nodes more accurately, which helps to ensure data security (cooperate with the parameters such as min-slaves-to-write mentioned above).

(2) If the slave node judges that the connection has timed out, it can re-establish the connection in time to avoid long-term inconsistency with the master node data.

judgment mechanism

The core of master-slave replication timeout judgment lies in the repl-timeout parameter, which specifies the timeout threshold (60s by default), which is valid for both the master node and the slave node; the conditions for master-slave nodes to trigger timeout are as follows:

(1) Master node: Call the replication timing function replicationCron() once per second, in which it is judged whether the current time is longer than the repl-timeout value from the last time it received REPLCONF ACK from each slave node, and if so, release the corresponding connection from the node.

(2) Slave node: The judgment of the timeout by the slave node is also judged in the replication timing function. The basic logic is:

  • If it is currently in the connection establishment phase and the time since the last time it received information from the master node has exceeded repl-timeout, release the connection with the master node;
  • If it is currently in the data synchronization stage and the time to receive the RDB file from the master node times out, stop the data synchronization and release the connection;
  • If it is currently in the command propagation phase and the time since the last time it received the PING command or data from the master node has exceeded the repl-timeout value, release the connection with the master node.

The relevant source code of the master-slave node judging the connection timeout is as follows:

/* Replication cron function, called 1 time per second. */

void replicationCron(void) {

    static long long replication_cron_loops = 0;



    /* Non blocking connection timeout? */

    if (server.masterhost &&

        (server.repl_state == REDIS_REPL_CONNECTING ||

         slaveIsInHandshakeState()) &&

         (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)

    {

        redisLog(REDIS_WARNING,"Timeout connecting to the MASTER...");

        undoConnectWithMaster();

    }



    /* Bulk transfer I/O timeout? */

    if (server.masterhost && server.repl_state == REDIS_REPL_TRANSFER &&

        (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)

    {

        redisLog(REDIS_WARNING,"Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value.");

        replicationAbortSyncTransfer();

    }



    /* Timed out master when we are an already connected slave? */

    if (server.masterhost && server.repl_state == REDIS_REPL_CONNECTED &&

        (time(NULL)-server.master->lastinteraction) > server.repl_timeout)

    {

        redisLog(REDIS_WARNING,"MASTER timeout: no data nor PING received...");

        freeClient(server.master);

    }



    //此处省略无关代码……



    /* Disconnect timedout slaves. */

    if (listLength(server.slaves)) {

        listIter li;

        listNode *ln;

        listRewind(server.slaves,&li);

        while((ln = listNext(&li))) {

            redisClient *slave = ln->value;

            if (slave->replstate != REDIS_REPL_ONLINE) continue;

            if (slave->flags & REDIS_PRE_PSYNC) continue;

            if ((server.unixtime - slave->repl_ack_time) > server.repl_timeout)

            {

                redisLog(REDIS_WARNING, "Disconnecting timedout slave: %s",

                    replicationGetSlaveName(slave));

                freeClient(slave);

            }

        }

    }



    //此处省略无关代码……



}

pits to watch out for

Here are some practical issues related to connection timeouts during the replication phase:

(1) Data synchronization stage: When the master-slave node performs a full copy of bgsave, the master node needs to first fork the child process to save the current data to the RDB file, and then transfer the RDB file to the slave node through the network. If the RDB file is too large, the master node spends too much time when forking the child process + saving the RDB file, which may cause the slave node to fail to receive data for a long time and trigger a timeout; at this time, the slave node will reconnect to the master node, and then full again Replication, timeout again, reconnection again... It's a sad cycle. In order to avoid this situation, in addition to noting that the Redis stand-alone data volume is not too large, on the other hand, it is necessary to increase the repl-timeout value appropriately. The specific size can be adjusted according to the time-consuming of bgsave.

(2) Command propagation phase: As mentioned earlier, in this phase, the master node will send a PING command to the slave node, and the frequency is controlled by repl-ping-slave-period; this parameter should be significantly smaller than the repl-timeout value (the latter is at least several times that of the former). Otherwise, if the two parameters are equal or close, the network jitter causes individual PING commands to be lost. At this time, the master node does not send data to the slave node, and the slave node can easily judge the timeout.

(3) Blocking caused by slow query: If the master node or slave node executes some slow queries (such as keys * or hgetall for big data, etc.), the server is blocked; causing the replication to time out.

3. Replication interruption problem

The master-slave node timeout is one of the reasons for replication interruption. In addition, there are other situations that may cause replication interruption, the most important of which is the replication buffer overflow problem.

copy buffer overflow

As mentioned earlier, during the full copy phase, the master node will put the executed write commands into the replication buffer, and the data stored in this buffer includes the write commands executed by the master node in the following time periods: bgsave generates RDB Files and RDB files are sent from the master node to the slave node, and the slave node clears the old data and loads the data in the RDB file. When the master node has a large amount of data, or the network delay between the master and slave nodes is large, the size of the buffer may exceed the limit, and the master node will disconnect from the slave node at this time; this situation may cause Full copy -> copy buffer overflow causes connection interruption -> reconnection -> full copy -> copy buffer overflow causes connection interruption... cycle.

The size of the copy buffer is configured by client-output-buffer-limit slave {hard limit} {soft limit} {soft seconds}, the default value is client-output-buffer-limit slave 256MB 64MB 60, which means: if the buffer is greater than 256MB, or greater than 64MB for 60 seconds in a row, the master node will disconnect from the slave node. This parameter can be dynamically configured through the config set command (that is, it can take effect without restarting Redis).

When the replication buffer overflows, the master node prints the log as follows:

It should be noted that the copy buffer is a type of client output buffer, and the master node will allocate a copy buffer for each slave node; while the copy backlog buffer is only one master node, no matter how many it has slave node.

4. Selection and optimization techniques for replication in each scenario

After introducing the details of Redis replication, now we can summarize when to use partial replication and what issues need to be paid attention to in the following common scenarios.

(1) Create a copy for the first time

At this time, full replication is inevitable, but there are still a few points to note: if the master node has a large amount of data, try to avoid the peak period of traffic to avoid congestion; if there are multiple slave nodes, it is necessary to establish a replication of the master node , you can consider staggering several slave nodes to avoid excessive bandwidth occupation of the master node. In addition, if there are too many slave nodes, you can also adjust the topology of master-slave replication, changing from a master-multiple-slave structure to a tree structure (the middle node is both the slave node of its master node and the master node of its slave nodes); However, you should be cautious when using the tree structure: although the number of direct slave nodes of the master node is reduced, the burden on the master node is reduced, but the delay of multi-layer slave nodes is increased, and the data consistency is deteriorated; and the structure is complex and maintenance is quite difficult.

(2) Restart the master node

The restart of the master node can be divided into two situations to discuss, one is a downtime caused by a failure, and the other is a planned restart.

master node down

After the master node is down and restarted, the runid will change, so partial replication cannot be performed, only full replication is possible.

In fact, when the master node is down, failover processing should be performed, one of the slave nodes will be upgraded to the master node, and the other slave nodes will be copied from the new master node; and the failover should be as automated as possible, as will be introduced in the following article Sentinel can perform automatic failover.

Safe restart: debug reload

In some scenarios, you may want to restart the master node, for example, the memory fragmentation rate of the master node is too high, or you want to adjust some parameters that can only be adjusted at startup. If the master node is restarted by ordinary means, the runid will change, which may lead to unnecessary full replication.

In order to solve this problem, Redis provides a restart method of debug reload: after restarting, the runid and offset of the master node will not be affected, avoiding full replication.

As shown in the figure below, runid and offset are not affected after debug reload restart:

But debug reload is a double-edged sword: it will clear the data in the current memory and reload it from the RDB file. This process will cause the blockage of the master node, so you need to be cautious.

(3) Restart from the node

After the slave node is down and restarted, the runid of the master node saved by it will be lost, so even if slaveof is executed again, partial replication cannot be performed.

(4) Network interruption

If there is a network problem between the master and slave nodes, causing a short-term network interruption, it can be divided into multiple situations for discussion.

Case 1: The network problem is extremely short-lived, causing only a short-term packet loss, and neither the master nor the slave node has judged a timeout (repl-timeout has not been triggered); at this time, it is only necessary to supplement the lost data through REPLCONF ACK.

The second situation: the network problem lasts for a long time, the master-slave node judges timeout (repl-timeout is triggered), and the lost data is too much, which exceeds the storage range of the replication backlog buffer; at this time, the master-slave node cannot Partial copy, only full copy. In order to avoid this situation as much as possible, the size of the replication backlog buffer should be appropriately adjusted according to the actual situation; in addition, timely detection and repair of network interruptions can also reduce full replication.

The third case: Between the above two cases, the judgment of the master-slave node times out, and the lost data is still in the replication backlog buffer; at this time, the master-slave node can perform partial replication.

5. Copy related configuration

This section summarizes the configurations related to replication, explaining the functions of these configurations, the phases in which they work, and the configuration methods, etc.; by understanding these configurations, on the one hand, you can deepen your understanding of Redis replication, and on the other hand, you can master the methods of these configurations. It can optimize the use of Redis and avoid pitfalls.

The configuration can be roughly divided into master node-related configuration, slave node-related configuration, and configuration related to both master and slave nodes, which are described below.

(1) Configuration related to both master and slave nodes

The most specific configuration is introduced first, which determines whether the node is a master or a slave:

1) slaveof <masterip> <masterport>: It works when Redis starts; the function is to establish a replication relationship, and the Redis server with this configuration turned on becomes a slave node after startup. This comment is commented out by default, that is, the Redis server is the master node by default.

2) repl-timeout 60: It is related to the timeout judgment of the master-slave node connection at each stage, see the previous introduction.

(2) Master node related configuration

1) repl-diskless-sync no: It is used in the full replication phase and controls whether the primary node uses diskless replication (diskless replication). The so-called diskless replication means that during full replication, the master node no longer writes the data into the RDB file first, but directly writes it into the socket of the slave. The whole process does not involve the hard disk; diskless replication is slow in disk IO and network speed Faster is more advantageous. It should be noted that as of Redis 3.0, diskless replication is in the experimental stage and is disabled by default.

2) repl-diskless-sync-delay 5: This configuration applies to the full replication phase. When the master node uses diskless replication, this configuration determines the pause time before the master node sends to the slave node, in seconds; only when diskless replication is enabled Valid, the default is 5s. The reason why the pause time is set is based on the following two considerations: (1) Once the transmission to the socket of the slave starts, the newly connected slave can only wait for the end of the current data transmission before starting a new data transmission (2) Multiple slave nodes There is a greater probability of establishing master-slave replication in a short period of time.

3) client-output-buffer-limit slave 256MB 64MB 60: related to the buffer size of the master node in the full copy phase, see the previous introduction.

4) repl-disable-tcp-nodelay no: related to the delay in the command propagation phase, see the previous introduction.

5) masterauth <master-password>: It is related to the identity verification in the connection establishment phase, see the previous introduction.

6) repl-ping-slave-period 10: It is related to the timeout judgment of the master-slave node in the command propagation phase, see the previous introduction.

7) repl-backlog-size 1mb: The size of the replication backlog buffer, see the previous introduction.

8) repl-backlog-ttl 3600: When the master node has no slave nodes, the time to keep the replication backlog buffer, so that when the disconnected slave node reconnects, partial replication can be performed; the default is 3600s. If set to 0, the copy backlog buffer is never freed.

9) min-slaves-to-write 3 and min-slaves-max-lag 10: Specifies the minimum number of slave nodes of the master node and the corresponding maximum delay, see the previous introduction.

(3) Slave node related configuration

1) slave-serve-stale-data yes: related to whether the slave node responds to client commands when the data is stale, see the previous introduction.

2) slave-read-only yes: Whether the slave node is read-only; the default is read-only. Because the data of the master and slave nodes is likely to be inconsistent when the slave node starts the write operation, this configuration should not be modified as much as possible.

6. Stand-alone memory size limit

In  the article Deep Learning Redis, Redis Persistence , we talked about the limitation of the fork operation on the memory size of a Redis stand-alone machine. In fact, in the use of Redis, there are many factors that limit the size of the stand-alone memory. The following summarizes the possible impact of excessive stand-alone memory in master-slave replication:

(1) Master cut: When the master node goes down, a common disaster recovery strategy is to promote one of the slave nodes to the master node and mount the other slave nodes to the new master node. Full replication is possible; if the Redis stand-alone memory reaches 10GB, the synchronization time of a slave node is at the level of several minutes; if there are more slave nodes, the recovery speed will be slower. If the read load of the system is high, and the slave nodes cannot provide services during this period, it will put a lot of pressure on the system.

(2) Slave library expansion: If the traffic suddenly increases, it is desirable to increase the slave nodes to share the read load. If the amount of data is too large, the slave node synchronization is too slow, and it is difficult to cope with the sudden increase in traffic in a timely manner.

(3) Buffer overflow: (1) and (2) are both cases where the slave node can synchronize normally (although it is slow), but if the amount of data is too large, the replication buffer of the master node in the full replication phase will overflow, resulting in replication If it is interrupted, the data synchronization of the master-slave node will be fully replicated -> replication buffer overflow causes replication interruption -> reconnection -> full replication -> replication buffer overflow causes replication interruption... cycle.

(4) Timeout: If the amount of data is too large, it takes too long for the master node to fork+save the RDB file in the full copy phase, and the slave node cannot receive data for a long time to trigger a timeout, and the data synchronization of the master-slave node may also fall into full copy -> timeout Causes replication interruption -> reconnection -> full replication -> timeout causes replication interruption... cycle.

In addition, the absolute amount of the master node's single-machine memory should not be too large, and its proportion of the host's memory should not be too large: it is best to use only 50%-65% of the memory, leaving 30%-45% of the memory for bgsave commands and create copy buffers etc.

7. info Replication

You can view the status related to replication through info Replication on the Redis client, which is helpful for understanding the current status of the master-slave node and solving problems that arise.

master node:

From the node:

For the slave node, the upper part shows its status as a slave node, starting from connectd_slaves, it shows its status as a potential master node.

Most of the content shown in info Replication has been described in the article, so I won't go into details here.

7. Summary

Let's review the main content of this article:

1. The role of master-slave replication: a macro understanding of what kind of problems master-slave replication is designed to solve, namely data redundancy, fault recovery, read load balancing, etc.

2. The operation of master-slave replication: the slaveof command.

3. The principle of master-slave replication: master-slave replication includes the connection establishment phase, data synchronization phase, and command propagation phase; in the data synchronization phase, there are two data synchronization methods: full replication and partial replication; in the command propagation phase, the master-slave node There are PING and REPLCONF ACK commands to check each other's heartbeat.

4. Problems in the application: including the problems of read-write separation (data inconsistency, data expiration, failover, etc.), replication timeout, replication interruption, etc., and then summarizes the configuration related to master-slave replication, among which repl- Timeout, client-output-buffer-limit slave, etc. may be helpful to solve problems in Redis master-slave replication.

Although master-slave replication solves or alleviates problems such as data redundancy, fault recovery, and read load balancing, its defects are still obvious: fault recovery cannot be automated; write operations cannot be load balanced; storage capacity is limited by a single machine; the solution to these problems , need the help of sentry and cluster, I will introduce in the following article, welcome to pay attention.

references

"Redis Development and Operation and Maintenance"

"Redis Design and Implementation"

"Redis in Action"

http://mdba.cn/2015/03/16/redis replication interruption problem - slow query/

https://redislabs.com/blog/top-redis-headaches-for-devops-replication-buffer/

http://mdba.cn/2015/03/17/redis master-slave replication (2)-replication-buffer and replication-backlog/

Guess you like

Origin blog.csdn.net/qq_41872328/article/details/130047886