通过cluster reshard实现Redis集群缩减节点实战【详细步骤】

 今年由于新冠疫情的影响,可能很多同学都被公司要求压缩服务器成本。这两天,公司领导把小眼睛瞟到笔者开发环境里那个可怜的redis集群上。好吧,那么我们就来操作一下缩减redis集群。

查看cluster nodes

使用redis-cli --cluster check命令来查看节点信息。

$ redis-cli --cluster check 192.168.1.195:6379 -a xxxxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.195:6379 (414f2b74...) -> 37651 keys | 3277 slots | 0 slaves.
192.168.1.195:6679 (e7a768fa...) -> 37614 keys | 3277 slots | 0 slaves.
192.168.1.195:6479 (1edda72f...) -> 37486 keys | 3277 slots | 0 slaves.
192.168.1.195:6579 (24671149...) -> 37639 keys | 3276 slots | 0 slaves.
192.168.1.195:6779 (f9b6419c...) -> 37489 keys | 3277 slots | 0 slaves.
[OK] 187879 keys in 5 masters.
11.47 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.195:6379)
M: 414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379
   slots:[0-3276] (3277 slots) master
M: e7a768faf1d1b6139f4a6192285329a5888b8381 192.168.1.195:6679
   slots:[9830-13106] (3277 slots) master
M: 1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479
   slots:[3277-6553] (3277 slots) master
M: 2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579
   slots:[6554-9829] (3276 slots) master
M: f9b6419cf22315108872b81f7d3e76df8b0036e8 192.168.1.195:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

命令参数说明 

参数 说明
--cluster 集群操作命令
check 检查集群信息子命令
192.168.1.195:6379 ip地址和端口,有多个节点的话只需提供其中一个节点就行。
-a xxxxx 密码

可以看到目前有5个master节点,没有slave节点,所有的slot都已经覆盖。

开始reshard,把端口号为6679和6779两个节点的slot平均地迁移到其他3个节点上。

使用reshard命令的交互模式,我们先从6679移动2184个slot到6579上。以下是交互过程:

问:你要迁移多少个slot,回答2184。

问:接收这些slot的节点id,复制6579那个节点的id。

问:迁出slot的节点,可以提供多个,最后一个填“done"。复制6679的节点id,回车,然后在source node #2那里填done。

之后系统会把迁移的计划打印出来。问你是否继续,填yes。

然后系统就会开始执行迁移动作。

$ redis-cli --cluster reshard 192.168.1.195:6379 -a xxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing Cluster Check (using node 192.168.1.195:6379)
M: 414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379
   slots:[0-3276] (3277 slots) master
M: e7a768faf1d1b6139f4a6192285329a5888b8381 192.168.1.195:6679
   slots:[9830-13106] (3277 slots) master
M: 1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479
   slots:[3277-6553] (3277 slots) master
M: 2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579
   slots:[6554-9829] (3276 slots) master
M: f9b6419cf22315108872b81f7d3e76df8b0036e8 192.168.1.195:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 2184
What is the receiving node ID? 2467114902a123079ca83e48642edbd7df0281ff
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: e7a768faf1d1b6139f4a6192285329a5888b8381
Source node #2: done

Ready to move 2184 slots.
  Source nodes:
    M: e7a768faf1d1b6139f4a6192285329a5888b8381 192.168.1.195:6679
       slots:[9830-13106] (3277 slots) master
  Destination node:
    M: 2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579
       slots:[6554-9829] (3276 slots) master
  Resharding plan:
    Moving slot 9830 from e7a768faf1d1b6139f4a6192285329a5888b8381
    Moving slot 9831 from e7a768faf1d1b6139f4a6192285329a5888b8381
    ......
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 9830 from 192.168.1.195:6679 to 192.168.1.195:6579: .......
Moving slot 9831 from 192.168.1.195:6679 to 192.168.1.195:6579: .........
Moving slot 9832 from 192.168.1.195:6679 to 192.168.1.195:6579: ...............
Moving slot 9833 from 192.168.1.195:6679 to 192.168.1.195:6579: ................
......

执行完成之后,再检查一次节点。会发现6679节点的slot数量减少了,变成了1093个,6579节点的slot数增多了,变成5460个。

$ redis-cli --cluster check 192.168.1.195:6379 -a newpathfly1234
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.195:6379 (414f2b74...) -> 37651 keys | 3277 slots | 0 slaves.
192.168.1.195:6679 (e7a768fa...) -> 12524 keys | 1093 slots | 0 slaves.
192.168.1.195:6479 (1edda72f...) -> 37486 keys | 3277 slots | 0 slaves.
192.168.1.195:6579 (24671149...) -> 62729 keys | 5460 slots | 0 slaves.
192.168.1.195:6779 (f9b6419c...) -> 37489 keys | 3277 slots | 0 slaves.
[OK] 187879 keys in 5 masters.
11.47 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.195:6379)
M: 414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379
   slots:[0-3276] (3277 slots) master
M: e7a768faf1d1b6139f4a6192285329a5888b8381 192.168.1.195:6679
   slots:[12014-13106] (1093 slots) master
M: 1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479
   slots:[3277-6553] (3277 slots) master
M: 2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579
   slots:[6554-12013] (5460 slots) master
M: f9b6419cf22315108872b81f7d3e76df8b0036e8 192.168.1.195:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

这一次迁移就完成了。继续上述操作,直到把所有的slot都迁移完。可以看到6679和6779两个节点的slot都是0了。

$ redis-cli --cluster check 192.168.1.195:6379 -a newpathfly1234
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.195:6379 (414f2b74...) -> 62680 keys | 5460 slots | 0 slaves.
192.168.1.195:6679 (e7a768fa...) -> 0 keys | 0 slots | 0 slaves.
192.168.1.195:6479 (1edda72f...) -> 62470 keys | 5464 slots | 0 slaves.
192.168.1.195:6579 (24671149...) -> 62729 keys | 5460 slots | 0 slaves.
192.168.1.195:6779 (f9b6419c...) -> 0 keys | 0 slots | 0 slaves.
[OK] 187879 keys in 5 masters.
11.47 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.195:6379)
M: 414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379
   slots:[0-3276],[12014-14196] (5460 slots) master
M: e7a768faf1d1b6139f4a6192285329a5888b8381 192.168.1.195:6679
   slots: (0 slots) master
M: 1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479
   slots:[3277-6553],[14197-16383] (5464 slots) master
M: 2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579
   slots:[6554-12013] (5460 slots) master
M: f9b6419cf22315108872b81f7d3e76df8b0036e8 192.168.1.195:6779
   slots: (0 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

移除节点

直接关掉6679和6779两个节点的容器。你不是使用容器?可以直接关闭redis服务,或者关机,拔电源,随便你。

docker stop 54a318487428 2a47bffc66b5

通过cluster check命令,可以看到节点已经不见了。 

$ redis-cli --cluster check 192.168.1.195:6379 -a xxxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at 192.168.1.195:6679: Connection refused
Could not connect to Redis at 192.168.1.195:6779: Connection refused
192.168.1.195:6379 (414f2b74...) -> 62680 keys | 5460 slots | 0 slaves.
192.168.1.195:6479 (1edda72f...) -> 62470 keys | 5464 slots | 0 slaves.
192.168.1.195:6579 (24671149...) -> 62729 keys | 5460 slots | 0 slaves.
[OK] 187879 keys in 3 masters.
11.47 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.195:6379)
M: 414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379
   slots:[0-3276],[12014-14196] (5460 slots) master
M: 1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479
   slots:[3277-6553],[14197-16383] (5464 slots) master
M: 2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579
   slots:[6554-12013] (5460 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

 但是通过cluster nodes命令还是可以看到节点存在,只是状态为fail。

$ redis-cli -c -h 192.168.1.195 -p 6379 -a xxxx cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
e7a768faf1d1b6139f4a6192285329a5888b8381 192.168.1.195:6679@16679 master,fail - 1640661580306 1640661577288 4 disconnected
1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479@16479 master - 0 1640661762479 8 connected 3277-6553 14197-16383
414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379@16379 myself,master - 0 1640661762000 7 connected 0-3276 12014-14196
2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579@16579 master - 0 1640661763486 6 connected 6554-12013
f9b6419cf22315108872b81f7d3e76df8b0036e8 192.168.1.195:6779@16779 master,fail - 1640661560192 1640661556000 5 disconnected

这个时候集群是可以使用的,但是有强迫症的你怎么可以容忍这种情况呢。所以要去掉那些不要的节点。可以使用cluster forget命令。注意,这个命令要在所有剩下的节点里执行才行,要不然只是让一个节点删掉信息,这个删掉的信息会在其他节点那里保存的,过一会还会继续同步过来。你就会发现删掉的节点又跑出来了。

$ redis-cli -c -h 192.168.1.195 -p 6379 -a xxxx cluster forget e7a768faf1d1b6139f4a6192285329a5888b8381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK
$ redis-cli -c -h 192.168.1.195 -p 6479 -a xxxx cluster forget e7a768faf1d1b6139f4a6192285329a5888b8381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK
$ redis-cli -c -h 192.168.1.195 -p 6579 -a xxxx cluster forget e7a768faf1d1b6139f4a6192285329a5888b8381
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK

执行这个操作把2个节点都删掉。

再用cluster nodes命令检查一次

$ redis-cli -c -h 192.168.1.195 -p 6479 -a xxxx cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
1edda72f1e5771a433a2c450cb9f5f555149fc63 192.168.1.195:6479@16479 myself,master - 0 1640662129000 8 connected 3277-6553 14197-16383
2467114902a123079ca83e48642edbd7df0281ff 192.168.1.195:6579@16579 master - 0 1640662128592 6 connected 6554-12013
414f2b745d2614a7a2bb592ca4d54febf550d3a9 192.168.1.195:6379@16379 master - 0 1640662130599 7 connected 0-3276 12014-14196

完美!

如果你在reshard的过程中,某个节点由于太过激动死机了,那你的这个reshard操作就挂了,这个时候你该怎么做呢?请看这里:redis reshard 失败恢复详细步骤

如果你在移除节点的时候,误删了某个要保留的节点,要如何恢复呢?请看这里:误用cluster forget命令删除了节点,如何把节点加回来?

猜你喜欢

转载自blog.csdn.net/marlinlm/article/details/122184791