Redis master-slave replication and optimization

Introduction: Redis master-slave replication and optimization

Redis master-slave replication and optimization

Insert picture description here

Master-slave replication

Before we pay attention to master-slave replication, we must first consider what is wrong with a single machine?

  • machine malfunction
  • Capacity bottleneck
  • QPS bottleneck

These are the problems encountered by a single node, so this time there is a master-slave replication (one master and one slave, one master and multiple slaves)

Insert picture description here

Use master-slave replication to:

  • Data copy
  • Extended read performance

note:

  • A master can have multiple slaves
  • One slave has only one master
  • Data flow is one-way, from master to slave

Master-slave replication configuration

Two ways to achieve

  • slaveof command

Two machines: master node: 47.11.11.11 slave node 47.22.22.22

Execute the slaveof command on the slave node

47.22.22.22-6379 > slacefof 47.11.11.11 6379
OK

Cancel copy:

47.22.22.22-6379 > slacefof no one
OK
  • Change setting
slaveof ip  port    //从节点ip + 端口
slave-read-only yes //开启只做读的操作
  • Comparison of two ways

Insert picture description here

  • View master and slave
127.0.0.1:6379> info replication
# Replication
role:master   //主节点 
connected_slaves:0
master_replid:1d43401335a5343b27b1638fc9843e3a593fc1a7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

Knowledge points:

  • Main node runID:

After each redis node is started, a 40-digit hexadecimal string is dynamically allocated as the running ID. The main function of the running ID is to uniquely identify the redis node. For example, the slave node saves the running ID of the master node to identify which master node it is replicating. If only the ip+port method is used to identify the master node, then the master node restarts and changes the overall data set (such as replacing the RDB/AOF file), and it is unsafe for the slave node to copy data based on the offset, so when the running ID changes Then the slave node will do full replication. You can view the running ID of the current node in the info server command.

It should be noted that when redis is closed and restarted, the running id will change accordingly.


Full copy and partial copy, etc.

Full copy

It is used for initial replication or other situations where partial replication cannot be performed. All data in the master node is sent to the slave node. When the amount of data is too large, it will cause a lot of network overhead.

redis2.8+ full copy process

Insert picture description here

Cost:

  1. bgsave time
  2. RDB file network transmission
  3. Time to clear data from node
  4. Load RDB time from node
  5. Possible AOF rewrite time

Partial copy

Used to deal with data loss scenarios caused by network crashes in master-slave replication. When the slave node connects to the master node again, if conditions permit, the master node will reissue the lost data to the slave node because the reissued data is far Less than the full amount of data can effectively avoid the excessive overhead of full copying. However, it should be noted that if the network interruption time is too long and the master node cannot completely save the write commands executed during the interruption, partial replication cannot be performed, and full replication is still used.

Process:
Insert picture description here

Copy offset:

  • The master and slave nodes participating in replication will maintain their own replication offset. After the master node has processed the write command operation, it will accumulate the byte length of the command. The statistical information is in the master_repl_offset indicator in info replication.
  • The slave node reports its replication offset to the master node every second, so the master node will also save the slave node's replication offset slave0:ip=192.168.1.3,port=6379,state=online,offset=116424,lag =0
  • After the slave node receives the command sent by the master node, it will also accumulate and record its own offset. Statistics are in slave_repl_offset in info replication.

Copy the backlog buffer:

  • The copy backlog buffer is a fixed-length queue stored on the master node, with a default size of 1MB. It is created when the master node has a connected slave node. At this time, when the master node responds to a write command, it will not only send the command to the slave node , And write to the copy backlog buffer.
    In the command propagation phase, in addition to sending the write command to the slave node, the master node will also send a copy to the copy backlog buffer as a backup of the write command; in addition to storing the write command, the copy backlog buffer also stores each of them The copy offset corresponding to the byte (offset). Since the copy backlog buffer has a fixed length and first-in-first-out, it saves the write commands recently executed by the master node; the write commands that are earlier will be pushed out of the buffer.

Common problems in production

Read and write separation

Shunt to the slave node. The master node writes data, and the slave node reads data, it may encounter read problems

  1. Data replication delay
  2. Read expired data
  3. Slave node failure
Inconsistent master-slave configuration
  1. For example, inconsistent maxmemory will cause data loss
  2. For example, data structure optimization parameters (such as hash-max-ziplist-entries): memory is inconsistent
Avoid full copy
  1. When copying in full for the first time
      -the first time is unavoidable, try to keep the nodes as small as possible and handle low peaks
  2. Node running ID does not match
      -failover, such as sentinel or cluster
  3. Insufficient copy backlog buffer area
      -increase the copy buffer area configuration rel_backlog_size, network enhancement
Avoid the replication storm
  1. Single-machine replication storm (redis<4.0 When the master is down and restarted, it will cause all slaves under the machine to replicate at the same time. Avoid deploying a set of redis master-slave for a single machine) ====》The master node is scattered across multiple machines

Final notes:

  • In the implementation of the above process, if AOF persistence is not enabled from the library, if the AOF persisted from the library is enabled, full replication is still used when restarting.
  • The data copied from the master will not be lost, but the data written by the previous master (6379 node in the above figure) will not be synchronized anymore.
  • Slaveof can be used to change the master node it belongs to, that is, to become the slave of another master again, but the new master will first clear all the data from the slave node
  • Regarding read and write separation delay: Read and write separation, the master will copy the data to the slave in one step. If the slave is blocked, the write command of the master data will be delayed, causing data inconsistency. ------- Generally do not consider this issue
  • Read expired data: Redis has two strategies when deleting keys. One is the lazy strategy, that is, the key is deleted only when redis operates the key, and the second is to periodically sample the key to delete ------ --When the key data is very large, the sampling speed is not as fast as the key generation speed, which will result in a lot of outdated data not being deleted, because redis is generally on the master node (adding and deleting data), and the slave can not delete the outdated data even if it is queried. Will cause the slave to read expired data (resolved in redis3.2)
  • Recommend redis master-slave article https://www.cnblogs.com/wdliu/p/9407179.html
  • Recommend redis full copy and partial copy articles https://blog.csdn.net/gaobinzhan/article/details/106536326

Personal blog: [ http://blog.yanxiaolong.cn/] ( Personal blog: http://blog.yanxiaolong.cn/
)

Original link: https://developer.aliyun.com/article/775627?

Copyright statement: The content of this article is voluntarily contributed by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find that there is suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Guess you like

Origin blog.csdn.net/alitech2017/article/details/109095706