Redis High Availability (2.1): Failover (Sentry)

Sentinel Introduction

Redis-Sentinel is an officially recommended high-availability solution. When redis is doing a high-availability solution for master-slave, if the master goes down, redis itself (and many of its clients) do not automatically switch between active and standby, and redis-sentinel itself is also an independent process that can be deployed on other machines that can communicate with the redis cluster to monitor the redis cluster.

Sentinel is a tool for monitoring the Master status in a redis cluster. It supports cluster deployment. It is recommended to deploy at least 3 sentinels for production environments.

 

effect

1. Cluster monitoring, responsible for monitoring whether the redis master and slave processes are working properly

2. Failover, if the master is abnormal, the master-slave switch will be performed, one of the slaves will be used as the master, and the previous master will be used as the slave 

3. In the configuration center, after the Master-Slave switch, the contents of master_redis.conf, slave_redis.conf and sentinel.conf will change, that is, there will be one more line of slaveof configuration in master_redis.conf, and the original slaveof configuration in slave_redis.conf will be removed. , the monitoring target of sentinel.conf will be changed accordingly. The sentinel provides service discovery for the client. The client links to the sentinel, and the sentinel provides the address of the latest available master. If there is a switch, that is, the master hangs up, the sentinel will provide the client with a new address.

4. Message notification, if a redis instance fails, the sentinel is responsible for sending a message as an alarm notification to the administrator

 

 How Sentinel Works

1. Each Sentinel sends a PING command to the Master, Slave and other Sentinel instances it knows about once per second

2. If the time from the last valid reply to the PING command for an instance exceeds the value specified by the down-after-milliseconds option, the instance will be marked as subjective offline by Sentinel.

3. If a Master is marked as subjective offline, all Sentinels monitoring the Master must confirm that the Master has indeed entered the subjective offline state at a frequency of once per second.

4. When a sufficient number of Sentinels (greater than or equal to the value specified in the configuration file) confirm that the Master has indeed entered the subjective offline state within the specified time range, the Master will be marked as an objective offline

5. Under normal circumstances, each Sentinel will send INFO commands to all known Masters and Slaves every 10 seconds.

6. When the Master is marked as objectively offline by Sentinel, the frequency of Sentinel sending INFO commands to all slaves of the offline Master will be changed from once every 10 seconds to once per second

7. If there is not enough Sentinel to agree that the Master has been offline, the objective offline status of the Master will be removed. If the Master returns a valid reply to Sentinel's PING command, the Master's subjective offline status will be removed.

 

Subjective downline and objective downline

Subjective offline: SubjectivelyDown, referred to as SDOWN, refers to the offline judgment made by the current Sentinel instance on a redis server.

Objective offline: ObjectivelyDown, referred to as ODOWN, refers to the MasterServer offline judgment obtained after multiple Sentinel instances make SDOWN judgments on the MasterServer and communicate with each other through the SENTINEL is-master-down-by-addr command, and then Enable failover.

 

The sentinel system of redis is used to manage multiple redis servers, which can ensure a cluster of HA on a master node. The system mainly performs three tasks:

a. Monitoring: RedisSentinel monitors the running status of the master server and slave server in real time.

b. Notification: When there is a problem with a monitored Redis server, RedisSentinel can send a notification to the system administrator, or send notifications to other programs through the API

The architecture diagram of a simple master-slave structure plus sentinel cluster is as follows:


 上图是一主一从节点,加上两个部署了sentinel的集群,sentinel集群之间会互相通信,沟通交流redis节点的状态,做出相应的判断并进行处理,这里的主观下线状态和客观下线状态是比较重要的状态,它们决定了是否进行故障转移可以通过订阅指定的频道信息,当服务器出现故障得时候通知管理员

c、故障转移(failover)

 

配置文件配置可参考sentinel.conf。

增加如下配置:

#后台运行
daemonize yes
日志文件
logfile /usr/local/redis/logs/sentinel.log

 修改如下配置:

#运行端口号(确保防火墙中增加该号码的过滤)
port 26379
#绑定本机ip(sentinel集群中每个实例的配置除了此项不同,其他均可保持一致)
bind 192.168.0.100
####下面的配置可以配置多组,用于对不同的master进行监控####
#配置监控的master节点(mymaster为sentinel集群中对一组redis实例监控的代号,可自定义名称,保证sentinel集群中的名称一致即可)
--mymaster:自定义名称,后续其他配置项均基于此配置,表示配置项仅对当前master生效
--192.168.0.200:master节点ip
--6379:master节点端口号
--2:票数,<=count(sentinel),当master被标记为主观下线后,集群中其他sentinel对master进行是否下线的侦测从而汇总出一个票数,如果票数大于等于配置的值(此处为2),则master节点被标记为客观下线,随后进行故障转移
sentinel monitor mymaster 192.168.0.200 6379 2
#sentinel对master进行ping,如果配置时间(30s)没有得到回复,则master被标记为主观下线
sentinel down-after-milliseconds mymaster 30000
#故障转移,如果在该时间(ms)内未能完成failover操作,则认为该failover失败
sentinel failover-timeout mymaster 180000
#配置在进行故障转移时,运行多少个slave进行数据备份同步(越少速度越快),即有1个slave此时不能对外提供服务
sentinel parallel-syncs mymaster 1 
#设置master和slave节点密码,如果有,必须一样
sentinel auth-pass mymaster 123456
#针对节点为master时有效,要求至少有1个slave,数据复制和同步的延迟不能超过10秒。否则,master将拒绝写数据。可用于降低异步复制和脑裂导致的数据丢失。
min-slaves-to-write 1
min-slaves-max-lag 10

 

常用命令:

#连接当前sentinel(同连接redis类似)

redis-cli -h 127.0.0.1 -p 26379

#sentinel的基本状态信息 

info

#显示被监控的所有master以及它们的状态

sentinel masters

#显示指定master的信息和状态

sentinel master [sentinel中自定义的master名]

#显示指定master的所有slave以及它们的状态

sentinel slaves [sentinel中自定义的master名]

#返回指定master的ip和端口,如果正在进行failover或者failover已经完成,将会显示被提升为master的slave的ip和端口

sentinel get-master-addr-by-name [sentinel中自定义的master名]

#重置名字匹配该正则表达式的所有的master的状态信息,清楚其之前的状态信息,以及slaves信息

sentinel reset [sentinel中自定义的master名]

#列出指定名称下其他sentinel信息

sentinel sentinels [sentinel中自定义的master名]

#当主服务器失效时,在不询问其他Sentinel意见的情况下,强制开始一次自动故障迁移,但是它会给其他sentinel发送一个最新的配置,其他sentinel会根据这个配置进行更新

sentinel failover [sentinel中自定义的master名]

 

哨兵节点的增加和删除

增加sentinal,会自动发现。删除sentinal的步骤:

(1)停止sentinal进程

(2)SENTINEL RESET *,在所有sentinal上执行,清理所有的master状态

(3)SENTINEL MASTER mastername,在所有sentinal上执行,查看所有sentinal对数量是否达成了一致

 

故障转移流程

1、由sentinel主动发起failover或者发现主服务器已经进入客观下线状态。

2、sentinel对我们的当前纪元(epoch)进行自增,并尝试在这个纪元中当选为此次failover的总指挥。

3、如果当选失败,那么在设定的故障迁移超时时间的两倍之后,重新尝试当选。如果当选成功,那么执行以下步骤。

4、选出一个从redis实例,并将它升级为主redis实例。

5、向被选中的从redis实例发送SLAVEOF NO ONE 命令,让它转变为主redis实例。

6、通过发布与订阅功能, 将更新后的配置传播给所有其他Sentinel,其他 Sentinel对它们自己的配置进行更新。

7、向已下线主服务器的从服务器发送SLAVEOF命令,让它们去复制新的主服务器。

8、当所有从redis实例都已经开始复制新的主redis实例时, 领头Sentinel 终止这次故障迁移操作。

 

日志说明

+reset-master <instance details> :主服务器已被重置。

+slave <instance details> :一个新的从服务器已经被 Sentinel 识别并关联。

+failover-state-reconf-slaves <instancedetails> :故障转移状态切换到了reconf-slaves 状态。

+failover-detected <instance details>:另一个 Sentinel 开始了一次故障转移操作,或者一个从服务器转换成了主服务器。

+slave-reconf-sent <instance details>:领头(leader)的 Sentinel 向实例发送了 SLAVEOF 命令,为实例设置新的主服务器。

+slave-reconf-inprog <instancedetails> :实例正在将自己设置为指定主服务器的从服务器,但相应的同步过程仍未完成。

+slave-reconf-done <instance details>:从服务器已经成功完成对新主服务器的同步。

-dup-sentinel <instance details> :对给定主服务器进行监视的一个或多个 Sentinel 已经因为重复出现而被移除 —— 当 Sentinel 实例重启的时候,就会出现这种情况。

+sentinel <instance details> :一个监视给定主服务器的新 Sentinel 已经被识别并添加。

+sdown <instance details> :给定的实例现在处于主观下线状态。

-sdown <instance details> :给定的实例已经不再处于主观下线状态。

+odown <instance details> :给定的实例现在处于客观下线状态。

-odown <instance details> :给定的实例已经不再处于客观下线状态。

+new-epoch <instance details> :当前的纪元(epoch)已经被更新。

+try-failover <instance details> :一个新的故障迁移操作正在执行中,等待被大多数 Sentinel 选中(waiting to be elected by themajority)。

+elected-leader <instance details> :赢得指定纪元的选举,可以进行故障迁移操作了。

+failover-state-select-slave <instancedetails> :故障转移操作现在处于select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器。

no-good-slave <instance details> :Sentinel 操作未能找到适合进行升级的从服务器。Sentinel 会在一段时间之后再次尝试寻找合适的从服务器来进行升级,又或者直接放弃执行故障转移操作。

selected-slave <instance details> :Sentinel 顺利找到适合进行升级的从服务器。

failover-state-send-slaveof-noone<instance details> :Sentinel 正在将指定的从服务器升级为主服务器,等待升级功能完成。

failover-end-for-timeout <instancedetails> :故障转移因为超时而中止,不过最终所有从服务器都会开始复制新的主服务器(slaves will eventually be configured to replicate with the newmaster anyway)。

failover-end <instance details> :故障转移操作顺利完成。所有从服务器都开始复制新的主服务器了。

+switch-master <master name><oldip> <oldport> <newip> <newport> :配置变更,主服务器的 IP 和地址已经改变。 这是绝大多数外部用户都关心的信息。

+tilt :进入 tilt 模式。

-tilt :退出 tilt 模式

 

注意事项:

1、由于redis服务端和jedis客户端对读写分离支持的并不友好,所以基于当前结构,要实现真正意义上的读写分离,还需要修改jedis源码或进行二次封装,成本较高。实际上,在后期缓存数据较多时或读压力较大时,redis官方更推荐以集群(cluster)的方式存储数据(slot),多个slot将热点数据分散,主从+故障转移保证redis的高可用。

2、本文所讲的高可用基于实现(单)master节点的高可用(后期实现master cluster,集群中每个master的高可用与本文一致,即主从+故障转移(master cluster下的故障转移不是基于哨兵机制))

3、一主多从也属于redis集群的一种

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326274236&siteId=291194637