Dry: Detailed article Redis cluster core principles

Principle of cluster

Establish a system cluster mainly we need to solve two problems: data synchronization and cluster fault tolerance.

Naive program

A simple and crude solution is to deploy multiple identical Redis service, and then load balancing pressure to share and to monitor the service status. The advantage of this approach is that fault tolerance is simple, as long as there is a survival, the entire cluster will still be available. But its problem is to ensure that data is consistent when these Redis service, will lead to large amounts of data synchronization, but affect the performance and stability.

Redis clustering solutions

Redis cluster scheme is based on the idea of divide and rule. Redis data are stored in a Key-Value, while Key different data are mutually independent. Thus Key according to certain rules may be divided into multiple partitions, the data stored in different partitions on different nodes. This embodiment is similar to the structure of a hash table data structure. In achieving Redis cluster, using a hashing algorithm (equation is CRC16(Key) mod 16383) Key mapped to an integer ranging from 0 to 16,383. Thus each integer corresponding to the number of stored data Key-Value, an integer corresponding to such storage abstraction called a slot (slot). Each Redis Cluster nodes - exactly say what is master node - is responsible for a range of slots, all the nodes in the cluster covers the entire range of 0 to 16383 slots.

It is said that any computer problems can be solved by adding an intermediate layer. The concept is so one slot. It is between the data and the node, simplifying the expansion and contraction of the difficulty of the operation. Mapping relational data and the groove is completed by a fixed algorithm, does not require maintenance, only the node itself maintains a mapping relationship and grooves.

Dry: Detailed article Redis cluster core principles

Slave

The above program only solves the problem of performance scalability, fault tolerance and clustering did not improve. Improve fault tolerance methods generally use some backup / redundancy means. Responsible for a certain number of node slots is called a master node. In order to increase the stability of the cluster, each master node may be configured a number of backup nodes - called a slave node. Slave nodes are generally stored data as a cold backup master node, replace the master node when the master node goes down. In some of the data access pressure is relatively large, slave node can also provide functionality to read data, but the data real-time slave node will look slightly worse. The operation can only write data through the master node.

Redirect requests

When a node receives Redis commands to a key, if not the key corresponding slot within their own area of ​​responsibility, MOVED redirect error is returned to notify the client to the correct node to access data.

If the redirection error occurs frequently, it is bound to affect the performance of access. Because it is fixed mapping from the public key algorithm to trough, the client can maintain the mapping between internal slot to the node, you can calculate yourself by key slot to access the data, and then find the correct node, reducing redirect errors. Currently Redis client will most development languages ​​implement this strategy. This addresshttps://redis.io/clients can see the mainstream of Redis client language.

Node communication

Although the data stored in different nodes independently of each other, which still need to communicate with each other node to synchronize node status information. Gossip Redis cluster using the P2P protocol, constantly exchanging information between the communication node, the final status of all nodes will reach an agreement. Gossip common message are the following categories:

  • ping messages: each node continually initiate a ping message to other nodes, and for detecting whether the node line switching node status information.
  • pong message: when a response message is received ping, meet message.
  • meet news: The new node joins the message.
  • fail message: node offline message.
  • forget message: forget node messages, so a node offline. This command must be executed within 60 seconds of all nodes, or the node re-engage after more than 60 seconds message exchange. Practice is not recommended to be used directly forget order to operate the node offline.

Node offline

When a node problems, take some time to spread the majority of master node considers the node is indeed unavailable, the node can mark a real mark off the assembly line. Redis cluster node offline, including two aspects: subjective offline (pfail) and objective offline (fail).

  • Subjective offline: when the node A in the cluster-node-timeout time and a communication node B (ping-pong message) has failed, the node A that node B is not available, is marked as offline subjective, and propagate the status message to other node.
  • Objective offline: When a node is marked most of the master node in the cluster offline subjective, objective triggers off the assembly line process, mark the node truly offline.

Recovery

After holding a master node slots objective offline, the cluster will elect a master node promoted to replace it from the slave node. Redis cluster using the election - voting algorithm to select slave node. After a slave node must vote including the master node failure, including most of the master node in order to be promoted to master node. Assume cluster size of 3 from 3 master, you must have at least two main surviving node to perform failover. If the deployment will deploy two primary nodes on the same server, the server is down, unfortunately the cluster can not perform failover.

By default, Redis cluster master node is not available if there is, that there are a number of node slots are not responsible for, the entire cluster is unavailable. That is when a master node fails, the failover this time, the entire cluster are in an unusable state. Which for some business it is intolerable. You can be as cluster-require-full-coverage configuration will be configured NO , it will only affect the master node when it is responsible for the failure to access the data associated groove does not affect access to other nodes.

Build clusters

Start a new node

Modify Redis configuration file to enable the cluster mode:

# 开启集群模式
cluster-enabled yes
# 节点超时时间,单位毫秒
cluster-node-timeout 15000
# 集群节点信息文件
cluster-config-file "nodes-6379.conf"

Then start the new node.

Send a message to meet the cluster nodes

Use client initiates a command cluster <ip> <port>, the node will send a message to meet the specified IP and port of the new node joins the cluster.

Dry: Detailed article Redis cluster core principles

Distribution groove

上一步执行完后我们得到的是一个还没有负责任何槽的“空”集群。为了使集群可用,我们需要将16384个槽都分配到master节点数。

在客户端执行cluster add addslots {<a>...<b>}命令,将<a>~<b>范围的槽都分配给当前客户端所连接的节点。将所有的槽都分配给master节点后,执行cluster nodes命令,查看各个节点负责的槽,以及节点的ID。

接下来还需要分配slave节点。使用客户端连接待分配的slave节点,执行cluster replicate <nodeId>命令,将该节点分配为<nodeId>指定的master节点的备份。

使用命令直接创建集群

在Redis 5版本中redis-cli客户端新增了集群操作命令。

如下所示,直接使用命令创建一个3主3从的集群:

redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 \
127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1

如果你用的是旧版本的Redis,可以使用官方提供的redis-trib.rb脚本来创建集群:

./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 \
127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005

集群伸缩

扩容

扩容操作与创建集群操作类似,不同的在于最后一步是将槽从已有的节点迁移到新节点。

  1. 启动新节点:同创建集群。
  2. 将新节点加入到集群:使用redis-cli --cluster add-node命令将新节点加入集群(内部使用meet消息实现)。
  3. 迁移槽和数据:添加新节点后,需要将一些槽和数据从旧节点迁移到新节点。使用命令redis-cli --cluster reshard进行槽迁移操作。

收缩

为了安全删除节点,Redis集群只能下线没有负责槽的节点。因此如果要下线有负责槽的master节点,则需要先将它负责的槽迁移到其他节点。

  1. 迁移槽。使用命令redis-cli --cluster reshard将待删除节点的槽都迁移到其他节点。
  2. 忘记节点。使用命令redis-cli --cluster del-node删除节点(内部使用forget消息实现)。

集群配置工具

If your redis-cli version is less than 5, you can use redis-trib.rb script to do the above command. Click here to view redis-cliand redis-trib.rboperation command cluster.

Endurance of

There are two kinds of RDB and AOF Redis persistence strategy.

RDB persistence of a pit

RDB persistence God pit:

  • Even setting up save ""trying to close RDB, but there are still persistent RDB may trigger.
  • Copy the full amount from the node (for example, from the new node), the master node triggers generated RDB RDB persistent file. RDB then send the file to the node. Finally, the slave node and the master node will have the corresponding file RDB.
  • The implementation of shutdown, if not open AOF, also trigger RDB persistence.
  • Regardless of how the save set, as long as the RDB file exists, when redis will start to load the file.

as a result of:

  • If you close the RDB persistence (persistence and AOF), then when Redis restarted, it will load the last saved when copying or perform shutdown RDB files from the full amount of nodes . And this RDB file is likely to be a long-outdated data.
  • Under Cluster mode, after the restart and recover data from Redis RDB file if no read cluster-config-file nodes in the configuration, the labeled themselves as a separate master Key grooves and occupy corresponding recovered from the RDB data, this leads to the other cluster node can not be added .

Guess you like

Origin blog.51cto.com/14230003/2432331