redis-sentinel:
redis作为一个存储系统,可靠性非常重要,为此redis提供了哨兵的功能来检测master状态,并在master实例异常的时 实现主从库的切换,被集成在2.4以上的版本。
首先 redis-sentinel 是redis官方自带的工具,用于管理多个Redis服务器实例,使用的时候只需要启动若干个sentinel实例即可
其次redis-sentinel 是暴露服务的,可以通过spring-data-redis 等现有的工具做切换。
过程以及效果:
1. 监控(Monitoring): Sentinel 会不断地检查你的主服务器和从服务器是否运作正常。
2. 提醒(Notification): 当被监控的某个 Redis 服务器出现问题时, Sentinel 可以通过 API 向管理员或者其他应用程序发送通知。
3. 自动故障迁移(Automatic failover): 当一个主服务器不能正常工作时, Sentinel 会开始一次自动故障迁移操作, 它会将失效主服务器的其中一个从服务器升级为新的主服务器, 并让失效主服务器的其他从服务器改为复制新的主服务器; 当客户端试图连接失效的主服务器时, 集群也会向客户端返回新主服务器的地址, 使得集群可以使用新主服务器代替失效服务器。
Redis Sentinel 是一个分布式系统, 你可以在一个架构中运行多个 Sentinel 进程(progress), 这些进程使用流言协议(gossip protocols)来接收关于主服务器是否下线的信息, 并使用投票协议(agreement protocols)来决定是否执行自动故障迁移, 以及选择哪个从服务器作为新的主服务器。
1.集群环境
1.Linux服务器列表
使用4台CentOS Linux服务器搭建环境,其IP地址如下:
192.168.110.100
192.168.110.101
192.168.110.102
192.168.110.103
2.Redis服务部署环境
192.168.110.100
启动多个Redis sentinel服务,构成Redis sentinel集群
192.168.110.101
启动Redis服务,设置成主节点
192.168.110.102
启动Redis服务,设置成192.168.110.101的从节点
192.168.110.103
启动Redis服务,设置成192.168.110.101的从节点
2.配置并启动Redis主从集群
1.修改redis.conf配置文件
主节点的redis配置文件使用默认的配置文件就可以了,
从节点的redis配置文件修改如下:
# Master-Slave replication. Use slaveof to make a Redis instance a copy of
# another Redis server. A few things to understand ASAP about Redis replication.
#
# 1) Redis replication is asynchronous, but you can configure a master to
# stop accepting writes if it appears to be not connected with at least
# a given number of slaves.
# 2) Redis slaves are able to perform a partial resynchronization with the
# master if the replication link is lost for a relatively small amount of
# time. You may want to configure the replication backlog size (see the next
# sections of this file) with a sensible value depending on your needs.
# 3) Replication is automatic and does not need user intervention. After a
# network partition slaves automatically try to reconnect to masters
# and resynchronize with them.
#
# 主从同步。通过 slaveof 配置来实现Redis实例的备份。
# 注意,这里是本地从远端复制数据。也就是说,本地可以有不同的数据库文件、绑定不同的IP、监听不同的端口。
#
# slaveof <masterip> <masterport>
slaveof 192.168.110.1016379
注意:两台从节点都要改。
2.启动Redis主从集群
先启动192.168.110.101主节点,使用默认配置,脚本:
[lizhiwei@localhost bin]$ ./redis-server
再启动192.168.110.102和
192.168.110.103
从节点,使用刚才的配置,脚本:
./redis-server redis.conf
3.查看集群
192.168.110.101主节点
Replication
信息
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.101 info Replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.110.102,port=6379,state=online,offset=659,lag=1
slave1:ip=192.168.110.103,port=6379,state=online,offset=659,lag=0
master_repl_offset:659
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:658
192.168.110.102
从节点
Replication
信息
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.102 info Replication
# Replication
role:slave
master_host:192.168.110.101
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:701
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
192.168.110.103
从节点
Replication
信息
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.103 info Replication
# Replication
role:slave
master_host:192.168.110.101
master_port:6379
master_link_status:up
master_last_io_seconds_ago:9
master_sync_in_progress:0
slave_repl_offset:715
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
此时,存储到
192.168.110.101主节点的数据,在从节点中都可以查询到。从节点会备份主节点的数据。
3.配置sentinel集群并启动
1.创建sentinel.conf配置文件
port 26379
# sentinel announce-ip <ip>
# sentinel announce-port <port>
dir /tmp
################################# master001 #################################
sentinel monitor master001 192.168.110.10163792
# sentinel auth-pass <master-name> <password>
sentinel down-after-milliseconds master001 30000
sentinel parallel-syncs master001 1
sentinel failover-timeout master001 180000
# sentinel notification-script <master-name> <script-path>
# sentinel client-reconfig-script <master-name> <script-path>
# 可以配置多个master节点
################################# master002 #################################
配置文件说明:
1. port
:当前
Sentinel服务运行的端口
2. dir
:
Sentinel服务运行时使用的临时文件夹
3.sentinel monitor master001 192.168.110.10163792
:
Sentinel去监视一个名为
master001
的主redis实例,这个主实例的IP地址为本机地址
192.168
.
110.101
,端口号为
6379
,而将这个主实例判断为失效至少需要
2
个 Sentinel进程的同意,只要同意Sentinel的数量不达标,自动failover就不会执行
4.sentinel down-after-milliseconds master001 30000
:
指定了Sentinel认为Redis实例已经失效所需的毫秒数。
当实例超过该时间没有返回PING,或者直接返回错误,那么Sentinel将这个实例标记为主观下线。只有一个 Sentinel进程将实例标记为主观下线并不一定会引起实例的自动故障迁移:只有在足够数量的Sentinel都将一个实例标记为主观下线之后,实例才会被标记为客观下线,这时自动故障迁移才会执行
5.sentinel parallel-syncs master001 1
:指定了在执行故障转移时,最多可以有多少个从Redis实例在同步新的主实例,在从Redis实例较多的情况下这个数字越小,同步的时间越长,完成故障转移所需的时间就越长
6.sentinel failover-timeout master001 180000
:如果在该时间(ms)内未能完成failover操作,则认为该failover失败
7.sentinel notification-script <master-name> <script-path>
:
指定sentinel检测到该监控的redis实例指向的实例异常时,调用的报警脚本。该配置项可选,但是很常用
2.启动
sentinel集群
创建3个sentinel.conf配置文件:
sentinel001.conf、
sentinel002.conf、
sentinel003.conf并修改端口号分别为:
26379
、
36379
、
46379
,并启动服务:
./redis-sentinel sentinel001.conf
./redis-sentinel sentinel002.conf
./redis-sentinel sentinel003.conf
启动三个
sentinel
服务后会在其控制台看到如下信息:
./
redis
-
sentinel sentinel001
.
conf
,端口:
26379
[7743]01Oct06:20:38.162# Sentinel runid is ba6c42e1accc31290e11d5876275e1562564295d
[7743]01Oct06:20:38.162# +monitor master master001 192.168.110.101 6379 quorum 2
[7743]01Oct06:20:39.110*+slave slave 192.168.110.102:6379192.168.110.1026379@ master001 192.168.110.1016379
[7743]01Oct06:20:39.111*+slave slave 192.168.110.103:6379192.168.110.1036379@ master001 192.168.110.1016379
[7743]01Oct06:25:07.595*+sentinel sentinel 192.168.110.100:36379192.168.110.10036379@ master001 192.168.110.1016379
[7743]01Oct06:26:11.170*+sentinel sentinel 192.168.110.100:46379192.168.110.10046379@ master001 192.168.110.1016379
./
redis
-
sentinel sentinel002
.
conf
,端口:
36379
[7795]01Oct06:25:05.538# Sentinel runid is 52c14768b15837fb601b26328acf150c6bd30682
[7795]01Oct06:25:05.538# +monitor master master001 192.168.110.101 6379 quorum 2
[7795]01Oct06:25:06.505*+slave slave 192.168.110.102:6379192.168.110.1026379@ master001 192.168.110.1016379
[7795]01Oct06:25:06.515*+slave slave 192.168.110.103:6379192.168.110.1036379@ master001 192.168.110.1016379
[7795]01Oct06:25:07.557*+sentinel sentinel 192.168.110.100:26379192.168.110.10026379@ master001 192.168.110.1016379
[7795]01Oct06:26:11.168*+sentinel sentinel 192.168.110.100:46379192.168.110.10046379@ master001 192.168.110.1016379
./
redis
-
sentinel sentinel003
.
conf
,端口:
46379
[7828]01Oct06:26:09.076# Sentinel runid is c8509594be4a36660b2122b3b81f4f74060c9b04
[7828]01Oct06:26:09.076# +monitor master master001 192.168.110.101 6379 quorum 2
[7828]01Oct06:26:10.063*+slave slave 192.168.110.102:6379192.168.110.1026379@ master001 192.168.110.1016379
[7828]01Oct06:26:10.071*+slave slave 192.168.110.103:6379192.168.110.1036379@ master001 192.168.110.1016379
[7828]01Oct06:26:11.516*+sentinel sentinel 192.168.110.100:26379192.168.110.10026379@ master001 192.168.110.1016379
[7828]01Oct06:26:11.674*+sentinel sentinel 192.168.110.100:36379192.168.110.10036379@ master001 192.168.110.1016379
每个sentinel服务能知道其他所有的服务!
4.测试sentinel集群
1.停止192.168.110.101主节点
停止192.168.110.101Redis主节点后,在查看
Replication
信息如下:
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.101 info Replication
Could not connect to Redis at 192.168.110.101:6379:Connection refused
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.102 info Replication
# Replication
role:slave
master_host:192.168.110.103
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:29128
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.103 info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.110.102,port=6379,state=online,offset=30456,lag=1
master_repl_offset:30456
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:30455
[lizhiwei@localhost bin]$
发现
192.168.110.101Redis主节点已经不能连接,
192.168
.
110.103
成了主节点!
2.再启动
192.168.110.101主节点
再启动192.168.110.101Redis主节点后,在查看
Replication
信息如下:
### 启动脚本,仍然使用默认配置
[lizhiwei@localhost bin]$ ./redis-server
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.101 info Replication
# Replication
role:slave
master_host:192.168.110.103
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:57657
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.102 info Replication
# Replication
role:slave
master_host:192.168.110.103
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:60751
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.103 info Replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.110.102,port=6379,state=online,offset=63247,lag=1
slave1:ip=192.168.110.101,port=6379,state=online,offset=63247,lag=1
master_repl_offset:63393
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:63392
[lizhiwei@localhost bin]$
发现
192.168
.
110.101
节点启动后还再集群中,只不过成了从节点,
192.168
.
110.103
仍然是主节点,但是现在又有两个从节点了!
3.只留下一个sentinel服务,再停止192.168.110.103主节点,查看Redis集群是否出现新的主节点
停止sentinel服务,
只留下一个sentinel服务,再停止Redis主节点,
查看
Replication
信息如下:
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.101 info Replication
# Replication
role:slave
master_host:192.168.110.103
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:184231
master_link_down_since_seconds:43
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.102 info Replication
# Replication
role:slave
master_host:192.168.110.103
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:184231
master_link_down_since_seconds:52
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
[lizhiwei@localhost bin]$ ./redis-cli -h 192.168.110.103 info Replication
Could not connect to Redis at 192.168.110.103:6379:Connection refused
发现
192.168
.
110.103
主节点已经不能连接了,也
不存在Redis主节点,集群中无主节点了!!!分析原因是:sentinel.conf
配置的
sentinel monitor master001
192.168
.
110.101
6379
2
最后一个参数是2导致,若是但节点此配置的最后一个参数要使用是1。(此原因我已证实)
注意:
在生产环境下建议
sentinel节点的数量能在3个以上,并且最好不要在同一台机器上(使用同一网卡)。
结合spring
<bean id="jedisPoolConfig" class="redis.clients.jedis.JedisPoolConfig"> <property name="maxTotal" value="${redis.pool.maxActive}"/> <property name="maxIdle" value="${redis.pool.maxIdle}"/> <property name="maxWaitMillis" value="${redis.pool.maxWait}"/> <property name="testOnBorrow" value="${redis.pool.testOnBorrow}"/> </bean> <bean id="redisSentinel" class="redis.clients.jedis.JedisSentinelPool"> <constructor-arg index="0" value="mymaster"/> <constructor-arg index="1"> <set> <value>10.88.140.113:26379</value> <value>10.88.140.112:26379</value> </set> </constructor-arg> <constructor-arg index="2" ref="jedisPoolConfig"/> <constructor-arg index="3" value="PassW0rd12"/> </bean> <bean id="dataBase" class="com.qunar.flight.inter.seo.job.DataBase" init-method="init"/>