快来学Redis | Sentinel（哨兵）模式搭建过程

Redis Sentinel（哨兵）模式的搭建过程分为两个部分：

Master-Slave 集群的搭建；
Sentinel（哨兵）集群的搭建。

整体环境

Master-Slave 集群；
- Master：192.168.1.128:6379
- Slave01：192.168.1.128:6380
- Slave02：192.168.1.128:6381
Sentinel（哨兵）集群
- Sentinel01： 192.168.1.129:26379
- Sentinel02： 192.168.1.129:26380
- Sentinel03： 192.168.1.129:26381

Master-Slave 集群的搭建过程请看：快来学Redis | 主从复制架构的搭建过程

Redis 的安装请参看：快来学Redis | Linux下的安装

下面我们来看 Sentinel（哨兵）集群的搭建过程。

搭建过程

安装好 Redis 后，复制三份配置文件：

sentinel.conf 文件在源码目录下。

[root@peipei3514 /]# cp /usr/local/src/redis-4.0.9/sentinel.conf /usr/local/redis/sentinel26379.conf
[root@peipei3514 /]# cp /usr/local/src/redis-4.0.9/sentinel.conf /usr/local/redis/sentinel26380.conf
[root@peipei3514 /]# cp /usr/local/src/redis-4.0.9/sentinel.conf /usr/local/redis/sentinel26381.conf

分别修改配置文件：

# 设置绑定地址，使之可以被外部访问（绑定哨兵所在机器的ip地址）
bind 127.0.0.1 192.168.1.129

# 指定监听端口
port 26379

# 以后台方式运行
daemonize yes

# 设置日志级别为 verbose，方便观看日志信息
loglevel verbose

# 设置日志文件
logfile /usr/local/redis/logs/sentinel26379.log

# 指定工作目录
# dir <working-directory>
dir /usr/local/redis/sentinel/26379

# 指定别名  主节点地址  端口  哨兵个数（有几个哨兵监控到主节点宕机执行转移）
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel monitor mymaster 192.168.1.128 6379 2

# 如果哨兵3s内没有收到主节点的心跳，哨兵就认为主节点宕机了，默认是30秒
sentinel down-after-milliseconds mymaster 30000

# 选举出新的主节点之后，可以同时连接从节点的个数
sentinel parallel-syncs mymaster 1

# 如果10秒后,master仍没活过来，则启动failover,默认180s
sentinel failover-timeout mymaster 180000

# 配置连接redis主节点密码（如果需要的话进行配置）
# sentinel auth-pass <master-name> <password>

以此类推,修改端口 26380 及 26381 的配置。

建立相应的文件夹：

[root@peipei3514 /]# mkdir -p /usr/local/redis/sentinel/26379
[root@peipei3514 /]# mkdir -p /usr/local/redis/sentinel/26380
[root@peipei3514 /]# mkdir -p /usr/local/redis/sentinel/26381

注意：我们稍后要启动三个 redis 实例，其中端口为 6379 的 redis 设为 master，其他两个个设为 slave 。所以 mymaster 后跟的是 master 的 ip 和端口，最后一个 ‘2’ 代表只要有2个 sentinel 认为 master 下线，就认为该 master 客观下线，启动 failover 并选举产生新的 master。通常最后一个参数不能多于启动的 sentinel 实例数。建议至少启动三台 sentinel 实例。

分别启动三个 Sentinel 实例

启动 Sentinel 实例之前，先启动 Redis 服务器实例，并设置 6379 为 master，6380 和 6381 为 slave。

[root@peipei3514 /]# /usr/local/redis/bin/redis-sentinel /usr/local/redis/sentinel26379.conf
[root@peipei3514 /]# /usr/local/redis/bin/redis-sentinel /usr/local/redis/sentinel26380.conf
[root@peipei3514 /]# /usr/local/redis/bin/redis-sentinel /usr/local/redis/sentinel26381.conf

[root@peipei3514 /]# ps -ef | grep redis
root       2078      1  0 22:42 ?        00:00:00 /usr/local/redis/bin/redis-sentinel *:26379 [sentinel]
root       2089      1  0 22:44 ?        00:00:00 /usr/local/redis/bin/redis-sentinel *:26380 [sentinel]
root       2094      1  1 22:44 ?        00:00:00 /usr/local/redis/bin/redis-sentinel *:26381 [sentinel]
root       2099   1366  0 22:44 pts/0    00:00:00 grep --color=auto redis

启动日志（26379）（三台哨兵服务器的日志是一样的）：

[root@peipei3514 ~]# tail -f /usr/local/redis/logs/sentinel26379.log

1548:X 01 Jul 10:52:13.630 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=1548, just started
1548:X 01 Jul 10:52:13.630 # Configuration loaded
1549:X 01 Jul 10:52:13.643 * Running mode=sentinel, port=26379. # 在 26379 端口运行哨兵模式
1549:X 01 Jul 10:52:13.643 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1549:X 01 Jul 10:52:13.643 # Sentinel ID is 4ed5b983c127292f1a9ebf6c76af4d71c546693a
1549:X 01 Jul 10:52:13.644 # +monitor master mymaster 192.168.1.128 6379 quorum 2   # 已经监控到了master
1549:X 01 Jul 10:52:23.522 - Accepted 192.168.1.129:49867   # 增加新的哨兵
1549:X 01 Jul 10:52:34.820 - Accepted 192.168.1.129:51706   # 增加新的哨兵

sentinel 一些命令介绍

要使用 sentinel 的命令，我们需要用 redis-cli 命令进入到 sentinel：

[root@peipei3514 /]# /usr/local/redis/bin/redis-cli -h 192.168.1.129 -p 26379
192.168.1.129:26379>

INFO：sentinel 的基本状态信息
SENTINEL masters | SENTINEL master mymaster：列出所有被监视的主服务器，以及这些主服务器的当前状态
SENTINEL slaves mymaster：列出给定主服务器的所有从服务器，以及这些从服务器的当前状态
SENTINEL get-master-addr-by-name mymaster：返回给定名字的主服务器的 IP 地址和端口号
SENTINEL reset ：重置所有名字和给定模式 pattern 相匹配的主服务器。重置操作清除主服务器目前的所有状态，包括正在执行中的故障转移，并移除目前已经发现和关联的，主服务器的所有从服务器和 Sentinel 。
SENTINEL failover：当主服务器失效时，在不询问其他 Sentinel 意见的情况下，强制开始一次自动故障迁移，但是它会给其他sentinel发送一个最新的配置，其他sentinel会根据这个配置进行更新。

测试：

（1）登陆到 master：

[root@peipei3514 /]# /usr/local/redis/bin/redis-cli -h 192.168.1.128 -p 6379
192.168.1.128:6379> keys *  # 查询所有的key
1) "lpf01"
192.168.1.128:6379> set lpf02 liupeifeng02  # 设置新的数据
OK

[root@peipei3514 /]# /usr/local/redis/bin/redis-cli -h 192.168.1.128 -p 6380
192.168.1.128:6380> keys * # 查看所有的key，可以看到数据同步成功
1) "lpf01"
2) "lpf02"
192.168.1.128:6380> set lpf02 liupeifeng02  # 在从设置数据是不允许的（默认只读）。
(error) READONLY You can't write against a read only slave.

可以看到：我们的主从模式中，slave 默认是只读。

（2）目前 6379 是 master，我们强制 kill 掉 6379 的进程以后，查看sentinel 打出的日志信息：

kill 掉 6379 服务器的进程：

[root@peipei3514 ~]# ps -ef | grep redis
root       1658      1  0 13:16 ?        00:00:04 /usr/local/redis/bin/redis-server 192.168.1.128:6379
root       1669      1  0 13:16 ?        00:00:05 /usr/local/redis/bin/redis-server 192.168.1.128:6381
root       1690      1  0 13:21 ?        00:00:03 /usr/local/redis/bin/redis-server 192.168.1.128:6380
root       1700   1265  0 13:29 pts/0    00:00:00 grep --color=auto redis
[root@peipei3514 ~]# kill 1658

哨兵服务器的日志：

1549:X 01 Jul 11:06:24.936 # +sdown master mymaster 192.168.1.128 6379  # 这个哨兵认为 master 下线（主观）
1549:X 01 Jul 11:06:24.994 # +new-epoch 14
1549:X 01 Jul 11:06:24.998 # +vote-for-leader e062a3797da4b8f8ebc64c5b7cf38fd7495f2c40 14
1549:X 01 Jul 11:06:25.000 # +odown master mymaster 192.168.1.128 6379 #quorum 3/2  # 已经有两个哨兵认为 master 下线（变为客观下线）
1549:X 01 Jul 11:06:25.000 # Next failover delay: I will not start a failover before Sun Jul  1 11:12:25 2018 # 设定一个故障转移的时间点
1549:X 01 Jul 11:06:25.381 # +config-update-from sentinel e062a3797da4b8f8ebc64c5b7cf38fd7495f2c40 192.168.1.129 26381 @ mymaster 192.168.1.128                            6379
1549:X 01 Jul 11:06:25.382 # +switch-master mymaster 192.168.1.128 6379 192.168.1.128 6380 # 将 master 从 6379 切换为 6380
1549:X 01 Jul 11:06:25.386 * +slave slave 192.168.1.128:6381 192.168.1.128 6381 @ mymaster 192.168.1.128 6380   # 增加 6380 的 slave 节点
1549:X 01 Jul 11:06:25.386 * +slave slave 192.168.1.128:6379 192.168.1.128 6379 @ mymaster 192.168.1.128 6380   # 增加 6380 的 slave 节点
1549:X 01 Jul 11:06:55.435 # +sdown slave 192.168.1.128:6379 192.168.1.128 6379 @ mymaster 192.168.1.128 6380 # 不停地试探 6379 服务器

注意事项

故障转移失败：-failover-abort-no-good-slave master mymaster 192.168.1.128 6379，日志如下：

2114:X 24 Jun 22:58:14.084 # Sentinel ID is 4ed5b983c127292f1a9ebf6c76af4d71c546693a
2114:X 24 Jun 22:58:14.084 # +monitor master mymaster 192.168.1.128 6379 quorum 2
2114:X 24 Jun 22:58:20.338 * +sentinel-address-switch master mymaster 192.168.1.128 6379 ip 127.0.0.1 port 26380 for bf5144e4f9d40c61ab71485ae3039b0aa5df3506
2114:X 24 Jun 22:58:24.462 * +sentinel-address-switch master mymaster 192.168.1.128 6379 ip 127.0.0.1 port 26381 for e062a3797da4b8f8ebc64c5b7cf38fd7495f2c40
2114:X 24 Jun 22:58:44.095 # +sdown master mymaster 192.168.1.128 6379
2114:X 24 Jun 22:58:44.095 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 192.168.1.128 6379
2114:X 24 Jun 22:58:44.095 # +sdown slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 192.168.1.128 6379
2114:X 24 Jun 22:58:48.110 # +new-epoch 1
2114:X 24 Jun 22:58:48.112 # +vote-for-leader bf5144e4f9d40c61ab71485ae3039b0aa5df3506 1
2114:X 24 Jun 22:58:48.304 # +odown master mymaster 192.168.1.128 6379 #quorum 2/2
2114:X 24 Jun 22:58:48.305 # Next failover delay: I will not start a failover before Sun Jun 24 23:04:48 2018
2114:X 24 Jun 23:04:48.231 # +new-epoch 2
2114:X 24 Jun 23:04:48.231 # +try-failover master mymaster 192.168.1.128 6379
2114:X 24 Jun 23:04:48.233 # +vote-for-leader 4ed5b983c127292f1a9ebf6c76af4d71c546693a 2
2114:X 24 Jun 23:04:48.239 # e062a3797da4b8f8ebc64c5b7cf38fd7495f2c40 voted for 4ed5b983c127292f1a9ebf6c76af4d71c546693a 2
2114:X 24 Jun 23:04:48.239 # bf5144e4f9d40c61ab71485ae3039b0aa5df3506 voted for 4ed5b983c127292f1a9ebf6c76af4d71c546693a 2
2114:X 24 Jun 23:04:48.317 # +elected-leader master mymaster 192.168.1.128 6379
2114:X 24 Jun 23:04:48.317 # +failover-state-select-slave master mymaster 192.168.1.128 6379
2114:X 24 Jun 23:04:48.403 # -failover-abort-no-good-slave master mymaster 192.168.1.128 6379
2114:X 24 Jun 23:04:48.503 # Next failover delay: I will not start a failover before Sun Jun 24 23:10:48 2018
2114:X 24 Jun 23:10:48.390 # +new-epoch 3

从哨兵的日志可以看出，哨兵是从地址为 127.0.0.1 的 Redis 服务器接收信息的，这明显不对，再看一下配置文件（这部分是哨兵自动生成的）
这里写图片描述
可以看到，哨兵认为 master 和 slave 的地址为 127.0.0.1。从这里可以知道问题，使我们 Redis 服务器 bind 的地址不对：去掉 bind 地址列表中的 127.0.0.1

下面来看一个正确的
这里写图片描述

另外，上面的问题好了之后，切换 master 还是不成功，不知道为什么，但是过了几天，在重试的时候就好了，应该就是重启一下机器吧！！！！

参考文章：

前两篇文章写的非常好，大家可以重点关注一下。