Redis sentinel手动切换主备

实验环境

操作系统： Ubuntu 18.04.1 LTS
redis版本： redis 5.0.2

机器规划：

role	ip	port
Master	127.0.0.1	6379
Slave1	127.0.0.1	6380
Slave2	127.0.0.1	6381
Sentinel1	127.0.0.1	26379
Sentinel2	127.0.0.1	26380
Sentinel3	127.0.0.1	26381

触发主备切换

在任意sentinel节点执行failover命令：

root@sbc-VirtualBox:/etc/redis# redis-cli -p 26380 sentinel failover mymaster
OK

查看sentinel节点日志

root@sbc-VirtualBox:/etc/redis# vim /opt/soft/redis/data/26380.log

2031:X 26 Dec 2018 15:24:48.825 # Executing user requested FAILOVER of 'mymaster'
2031:X 26 Dec 2018 15:24:48.825 # +new-epoch 1
2031:X 26 Dec 2018 15:24:48.825 # +try-failover master mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:48.887 # +vote-for-leader 75cd9743a892825573793c4417e7e8465d65b616 1
2031:X 26 Dec 2018 15:24:48.887 # +elected-leader master mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:48.887 # +failover-state-select-slave master mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:48.970 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:48.970 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:49.033 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:49.944 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:49.944 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:50.010 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:50.953 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:50.953 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:51.028 # +failover-end master mymaster 127.0.0.1 6379
2031:X 26 Dec 2018 15:24:51.028 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
2031:X 26 Dec 2018 15:24:51.029 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
2031:X 26 Dec 2018 15:24:51.029 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380

提示已触发主备切换，原先的master节点6379变为slave，原先的slave节点6380变为master

查看6379的日志：

root@sbc-VirtualBox:/etc/redis# vim /opt/soft/redis/data/6379.log

1920:M 26 Dec 2018 15:24:49.034 # Connection with replica 127.0.0.1:6380 lost.
1920:M 26 Dec 2018 15:24:50.014 # Connection with replica 127.0.0.1:6381 lost.
1920:S 26 Dec 2018 15:25:00.069 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1920:S 26 Dec 2018 15:25:00.069 * REPLICAOF 127.0.0.1:6380 enabled (user request from 'id=1115 addr=127.0.0.1:44936 fd=8 name=sentinel-b6983fbf-cmd age=11 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=148 qbuf-free=32620 obl=36 oll=0 omem=0 events=r cmd=exec')
1920:S 26 Dec 2018 15:25:00.070 # CONFIG REWRITE executed with success.
1920:S 26 Dec 2018 15:25:00.532 * Connecting to MASTER 127.0.0.1:6380
1920:S 26 Dec 2018 15:25:00.532 * MASTER <-> REPLICA sync started
1920:S 26 Dec 2018 15:25:00.532 * Non blocking connect for SYNC fired the event.
1920:S 26 Dec 2018 15:25:00.532 * Master replied to PING, replication can continue...
1920:S 26 Dec 2018 15:25:00.533 * Trying a partial resynchronization (request 3f0a80689afa2c4fe0cc33b82ce97af74b847c88:3987638).
1920:S 26 Dec 2018 15:25:00.533 * Full resync from master: d62e33c39fd886cc9b78408cde7ed815c17ecc2d:3988096
1920:S 26 Dec 2018 15:25:00.533 * Discarding previously cached master state.
1920:S 26 Dec 2018 15:25:00.635 * MASTER <-> REPLICA sync: receiving 189 bytes from master
1920:S 26 Dec 2018 15:25:00.635 * MASTER <-> REPLICA sync: Flushing old data
1920:S 26 Dec 2018 15:25:00.635 * MASTER <-> REPLICA sync: Loading DB in memory
1920:S 26 Dec 2018 15:25:00.635 * MASTER <-> REPLICA sync: Finished with success

6379已经从master变为slave

查看6380节点日志：

root@sbc-VirtualBox:/etc/redis# vim /opt/soft/redis/data/6380.log

1928:M 26 Dec 2018 15:24:49.033 # Setting secondary replication ID to 3f0a80689afa2c4fe0cc33b82ce97af74b847c88, valid up to offset: 3985643. New replication ID is d62e33c39fd886cc9b78408cde7ed815c17ecc2d
1928:M 26 Dec 2018 15:24:49.033 # Connection with master lost.
1928:M 26 Dec 2018 15:24:49.033 * Caching the disconnected master state.
1928:M 26 Dec 2018 15:24:49.033 * Discarding previously cached master state.
1928:M 26 Dec 2018 15:24:49.033 * MASTER MODE enabled (user request from 'id=7 addr=127.0.0.1:51534 fd=11 name=sentinel-75cd9743-cmd age=20204 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
1928:M 26 Dec 2018 15:24:49.034 # CONFIG REWRITE executed with success.
1928:M 26 Dec 2018 15:24:50.400 * Replica 127.0.0.1:6381 asks for synchronization
1928:M 26 Dec 2018 15:24:50.400 * Partial resynchronization not accepted: Requested offset for second ID was 3986042, but I can reply up to 3985643
1928:M 26 Dec 2018 15:24:50.400 * Starting BGSAVE for SYNC with target: disk
1928:M 26 Dec 2018 15:24:50.400 * Background saving started by pid 4136
4136:C 26 Dec 2018 15:24:50.414 * DB saved on disk
4136:C 26 Dec 2018 15:24:50.415 * RDB: 0 MB of memory used by copy-on-write
1928:M 26 Dec 2018 15:24:50.502 * Background saving terminated with success
1928:M 26 Dec 2018 15:24:50.502 * Synchronization with replica 127.0.0.1:6381 succeeded
1928:M 26 Dec 2018 15:25:00.533 * Replica 127.0.0.1:6379 asks for synchronization
1928:M 26 Dec 2018 15:25:00.533 * Partial resynchronization not accepted: Requested offset for second ID was 3987638, but I can reply up to 3985643
1928:M 26 Dec 2018 15:25:00.533 * Starting BGSAVE for SYNC with target: disk
1928:M 26 Dec 2018 15:25:00.533 * Background saving started by pid 4137
4137:C 26 Dec 2018 15:25:00.547 * DB saved on disk
4137:C 26 Dec 2018 15:25:00.548 * RDB: 0 MB of memory used by copy-on-write
1928:M 26 Dec 2018 15:25:00.634 * Background saving terminated with success
1928:M 26 Dec 2018 15:25:00.635 * Synchronization with replica 127.0.0.1:6379 succeeded

6380从slave升为了master，其后6379节点作为slave连入集群

查看6381节点日志：

root@sbc-VirtualBox:/etc/redis# vim /opt/soft/redis/data/6381.log

1934:S 26 Dec 2018 15:24:50.013 # Connection with master lost.
1934:S 26 Dec 2018 15:24:50.013 * Caching the disconnected master state.
1934:S 26 Dec 2018 15:24:50.013 * REPLICAOF 127.0.0.1:6380 enabled (user request from 'id=7 addr=127.0.0.1:54626 fd=11 name=sentinel-75cd9743-cmd age=20205 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=281 qbuf-free=32487 obl=36 oll=0 omem=0 events=r cmd=exec')
1934:S 26 Dec 2018 15:24:50.014 # CONFIG REWRITE executed with success.
1934:S 26 Dec 2018 15:24:50.399 * Connecting to MASTER 127.0.0.1:6380
1934:S 26 Dec 2018 15:24:50.400 * MASTER <-> REPLICA sync started
1934:S 26 Dec 2018 15:24:50.400 * Non blocking connect for SYNC fired the event.
1934:S 26 Dec 2018 15:24:50.400 * Master replied to PING, replication can continue...
1934:S 26 Dec 2018 15:24:50.400 * Trying a partial resynchronization (request 3f0a80689afa2c4fe0cc33b82ce97af74b847c88:3986042).
1934:S 26 Dec 2018 15:24:50.400 * Full resync from master: d62e33c39fd886cc9b78408cde7ed815c17ecc2d:3986064
1934:S 26 Dec 2018 15:24:50.400 * Discarding previously cached master state.
1934:S 26 Dec 2018 15:24:50.502 * MASTER <-> REPLICA sync: receiving 189 bytes from master
1934:S 26 Dec 2018 15:24:50.502 * MASTER <-> REPLICA sync: Flushing old data
1934:S 26 Dec 2018 15:24:50.502 * MASTER <-> REPLICA sync: Loading DB in memory
1934:S 26 Dec 2018 15:24:50.502 * MASTER <-> REPLICA sync: Finished with success

6381节点脸上6380端口的master

验证主备切换成果

在6380端执行：

root@sbc-VirtualBox:/etc/redis# redis-cli -p 6380 info replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6381,state=online,offset=4120328,lag=0
slave1:ip=127.0.0.1,port=6379,state=online,offset=4120328,lag=1
master_replid:d62e33c39fd886cc9b78408cde7ed815c17ecc2d
master_replid2:3f0a80689afa2c4fe0cc33b82ce97af74b847c88
master_repl_offset:4120328
second_repl_offset:3985643
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3071753
repl_backlog_histlen:1048576

可看到主备切换已成功实现。

流程

1.手动触发failover模拟主节点出现故障，此时两个从节点与主节点失去连接，主从复制失败。
2.集群自动选出一个从节点（6380），对其执行slaveof no one命令使其成为新的主节点。
3.原来的从节点（6380）成为新的主节点后，更新sentinel的主节点信息。
4.sentinel命令另一个从节点（6381）去复制新的主节点（6380）。
5.让原来的主节点（6379）去复制新的主节点。

Redis sentinel手动切换主备

Redis sentinel手动切换主备

实验环境

触发主备切换

验证主备切换成果

流程

猜你喜欢