Redis replication process detailed

Redis copy function into synchronization (sync) and the propagation of commands (command propagate) two steps:

  • For state synchronization database updates from the server to which the current state of the primary database server.
  • Command is used to spread the state of the primary database server is modified, resulting in inconsistencies in the master database from the server when the state, so back to a consistent state from the primary database server.

Synchronize

Use psync Redis command to complete the data from the primary synchronization, the synchronization process is divided into: total and partial duplication replication.

The total amount of replication: Usually used for the primary copy scenes, it will send a one-time master node to all of the data sent from the node to a node, when the amount of data is large, will cause great overhead from the main node and the network.

Part Replication: lost in the network for processing primary reasons flash like network resulting from replication scenario, when connected to the master node from the node again, if conditions permit, the master node to the slave node replacement of missing data. Because the replacement of data is much less than the full amount of data that can effectively avoid the high overhead of copying the whole amount.

psync run command support requires the following components:

  • Primary copy offset from each node
  • The master node replication backlog buffer
  • Master node id

Participate copied from node will maintain its own copy offset. After processing the master node a write command, the byte length of the command will be accumulated records, statistics info replication in masterreploffset indicators. Slave node receiving the command from the master node will record its own cumulative offset, and will report its replication per second offset to the master node. By contrast replication offset from the main node, can determine whether the master node from the same data.

Replication backlog queue buffer is stored in a master node of a fixed length, the default size of 1MB, when the master node is created from the connection node. When the primary node in response to a write command, the command will only be sent to the node, also written copy backlog buffer.

Copy the backlog limited buffer size, only the most recent copy saved data for data recovery when a portion of the copy and copy commands missing.

40 are dynamically allocated a node start Redis after each run as a hexadecimal string ID. The main role of running ID is used to uniquely identify Redis nodes, for example, to identify themselves when being copied master node ID which saved the master node and slave nodes.

The full amount of sync

 

 

slaveof Command execution

  • 1) sent from node psync data synchronization command, because it is the first copy, copy operation is not the master node ID and the offset from the node, the command transmission time PSYNC? -1.
  • 2) The master node PSYNC? -1 to resolve the current total amount of replication, response replies + FULLRESYNC.
  • 3) receiving from the master node in response to the data storage node ID and the offset running offset.
  • 4) the master node performs bgsave save the file to the local RDB, RDB-related knowledge can see the "Redis RDB persistent explanation"
  • 5) The master node sends RDB file to the slave node, the slave node to the received RDB files stored locally and as a data file from the node, After receiving RDB print related logs directly from the node, you can view the data from the master node in the log the amount.

It should be noted, for the large amount of data the master node, such as RDB files generated more than to be cautious when 6GB or more. If the transmission RDB longer than the configured value repl-timeout, the initiating node receives from RDB files and clean up temporary files have been downloaded, leading to the full amount of replication to fail.

  • 6) for the master node during the snapshot to start saving RDB node receives from completion, is still the primary node in response to a read command, the master node will therefore write command stored in the buffer copy client during this period, when finished loading files from the RDB node, the master node then transmits data in the buffer to the slave node, to ensure consistency of data between the master and slave.

If the master node creation and transfer of RDB too long, the master copy client buffer overflow may occur. The default configuration for the client-output-buffer-limit slave 256MB 64MB 60, if the buffer is consumed within 60s or directly than 64MB for greater than 256MB, the master node will directly turn off replication for client connections, resulting in the total amount of synchronization failure.

  • Erases old data itself 7) After all the data transmitted to the master node from the receiving node, this step corresponds to the following log.
  • 8) starts loading data from the node empty file RDB, RDB for the enlarged document, this step is still relatively time-consuming, may be determined by the total time loading RDB time difference between logs is calculated.
  • 9) received a master SYNC command execution BGSAVE command to generate a RDB files in the background, and use a buffer from the record of all write commands now perform.
  • 10) When the command BGSAVE master server has finished executing, the master server will GBSAVE RDB command generated from the file to the server, the server receives from the document and load the RDB, will update its own database to the primary server when the command BGSAVE database state.
  • 11) the main server will be recorded in the buffer inside of the status of all write commands sent to, from the server from the server to perform these write command, the status update its own database to the primary database server is currently located.

By analyzing all processes copy of the full amount, the reader will find the full amount of replication is a very time-consuming operation. It is time overhead includes:

  • The master node bgsave time
  • RDB file transfer time network
  • Emptying time from the node data
  • RDB load from the node time
  • Possible AOF rewrite time

The full amount of the synchronization process will not only consume a lot of time, there will be multiple persistence-related operations and network data transmission, it will consume a lot from the master server node where this period of CPU, memory and network resources. So, in addition to the first synchronous replication is to use the whole amount can not be avoided, other scenes full amount should avoid duplication, take part synchronization.

Partial synchronization

The main part is a copy optimization measures Redis for the full amount of the cost is too high to make a copy, use psync {runId} {offset} command to achieve. When the node is being copied from the master node, if the network command flash or other abnormal loss occurs, from the node to the master node requires replacement of the lost command data, if the primary node replication backlog buffer part of the data that is present directly sent from the node, thus ensuring consistency from the master copy of the node. This replacement data is typically much less than the portion of the total amount of data, so the overhead is small.

 

 

  • 1) When the master interrupt occurs between the node from the network, if more than repl-timeout time, the master would think from node failure and interrupts the replication connection.
  • During the primary node 2) master-slave connection is lost still respond to commands, but the connection is interrupted copy command can not be sent to the slave node, but there is a backlog copy buffer (repl-backlog-buffer) inside the master node, you can still save the most recent write command data, the default maximum cache 1MB.

  • 3) When the master node of the network from the recovery, from the node connected to the master node again.

  • 4) When the connection is restored from the master, since the operation to save the master node ID and the offset of its own has been copied from the previous node. Thus psync which sends them as parameters to the master node, the copy operation requires replacement.

  • 5) the master node after receiving the first check whether the command parameters psync runId consistent with itself, if uniform, described previously copied is the current primary node; look after the own replication backlog buffer according to the parameter offset, if the data after the offset amounts buffer, in response to the transmission from node + CONTINUE, represents partial replication.

  • 6) the master node to a slave node, the master copy to ensure The offset into the normal state to copy the data in the buffer backlog.

Heartbeat

From the master node after establishment of replication, which maintains a persistent connection between a heartbeat and transmit the command to each other, as shown in FIG.

Analyzing the primary heartbeat mechanism is as follows:

  • 1) from the master node each has heartbeat mechanism is modeled as each of the other clients to communicate, the client view copy relevant information client list command, the master node of the connection state flags = M, from the node connected state flags = S.
  • 2) every 10 seconds by default master node transmits the ping command to the slave node, and determines the connection state from the surviving nodes. Transmission frequency can be controlled by the parameter repl-ping-slave-period.
  • 3) every 1 second sent from node replconf ack {offset} command in the main thread, the master node reports to replicate their current offset.

replconf command not only real-time monitoring the state of the master node from the network, but also offset from the node copy reports. The master node is lost offset check uploaded copy data from the node, if the missing data from the node, then the cache copy from the master node pulls the lost data is sent to the slave node.

Asynchronous replication and dissemination of command

The master node is responsible for not only reading and writing data, it is also responsible for the write command from the node to synchronize. Write command sending process is done asynchronously, which means that after processing the write command to return to the main node itself directly to the client without waiting for the completion of the node replication.

 

 

This asynchronous propagation process is handled by the command, it will not only write command is sent from the server to all, the team will also write command to copy the backlog buffer inside.

postscript

Personal blog, welcome to play

 

Guess you like

Origin www.cnblogs.com/remcarpediem/p/11701234.html