文章目录
Redis之以经典的3节点方式部署哨兵集群
Redis之Sentinal(哨兵)
Redis之哨兵底层原理解析
1. 哨兵的配置文件
redis的解压目录下sentinel.conf
为模板文件
最小的配置
每一个哨兵都可以去监控多个maser-slaves的主从架构
因为可能在公司里,为不同的项目,部署了多个master-slaves的redis主从集群
相同的一套哨兵集群,就可以去监控不同的多个redis主从集群
你自己给每个redis主从集群分配一个逻辑的名称,下文中的mymaster和resque均为逻辑名称,可根据实际需要自由配置
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
sentinel monitor resque 192.168.75.3 6380 4
sentinel down-after-milliseconds resque 10000
sentinel failover-timeout resque 180000
sentinel parallel-syncs resque 5
sentinel monitor mymaster 127.0.0.1 6379
类似这种配置,来指定对一个master的监控,给监控的master指定的一个名称,因为后面分布式集群架构里会讲解,可以配置多个master做数据拆分
sentinel down-after-milliseconds mymaster 60000 #1分钟
sentinel failover-timeout mymaster 180000 #3分钟
sentinel parallel-syncs mymaster 1
上面的三个配置,都是针对某个监控的master配置的,给其指定上面分配的名称即可
上面这段配置,就监控了两个master node
这是最小的哨兵配置,如果发生了master-slave故障转移,或者新的哨兵进程加入哨兵集群,那么哨兵会自动更新自己的配置文件
sentinel monitor master-group-name hostname port quorum
quorum
:
- 至少多少个哨兵要一致同意,master进程挂掉了,或者slave进程挂掉了,或者要启动一个故障转移操作
- quorum是用来识别故障的,真正执行故障转移的时候,还是要在哨兵集群执行选举,选举一个哨兵进程出来执行故障转移操作
- 假设有5个哨兵,quorum设置了2,那么如果5个哨兵中的2个都认为master挂掉了; 2个哨兵中的一个就会做一个选举,选举一个哨兵出来,执行故障转移; 如果5个哨兵中有3个哨兵都是运行的,那么故障转移就会被允许执行
down-after-milliseconds
:
超过多少毫秒跟一个redis实例断了连接,哨兵就可能认为这个redis实例挂了
parallel-syncs
:
新的master别切换之后,同时有多少个slave被切换到去连接新master,重新做同步,数字越低,花费的时间越多
假设部署的redis是1个master,4个slave
然后master宕机了,4个slave中有1个切换成了master,剩下3个slave就要挂到新的master上面去
这个时候,如果parallel-syncs
是1,那么3个slave,一个一个地挂接到新的master上面去,1个挂接完,而且从新的master sync完数据之后,再挂接下一个
如果parallel-syncs
是3,那么一次性就会把所有slave挂接到新的master上去
failover-timeout
:执行故障转移的timeout超时时长
2. 搭建过程
2.1 配置
哨兵默认用26379端口,默认不能跟其他机器在指定端口连通,只能在本地访问
执行命令
mkdir -p /var/sentinel/5000
mkdir -p /etc/sentinel
参考模板文件sentinel.conf
修改配置内容修改如下,并重命名为5000.conf
,将该文件放置到/etc/sentinel
目录下
- 不包含注释的配置文件内容 master节点配置文件
port 5000
bind 192.168.75.143
dir /var/sentinel/5000
sentinel monitor mymaster 192.168.75.143 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
slave节点配置,IP 192.168.75.144
和192.168.75.145
均为slave节点IP
192.168.75.144
机器下的/etc/sentinel/5000.conf
文件内容
port 5000
bind 192.168.75.144
dir /var/sentinel/5000
sentinel monitor mymaster 192.168.75.143 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
192.168.75.145
机器下的/etc/sentinel/5000.conf
文件内容
port 5000
bind 192.168.75.145
dir /var/sentinel/5000
sentinel monitor mymaster 192.168.75.143 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
- 包含注释的配置文件内容
# Example sentinel.conf
# *** IMPORTANT ***
#
# By default Sentinel will not be reachable from interfaces different than
# localhost, either use the 'bind' directive to bind to a list of network
# interfaces, or disable protected mode with "protected-mode no" by
# adding it to this configuration file.
#
# Before doing that MAKE SURE the instance is protected from the outside
# world via firewalling or other means.
#
# For example you may use one of the following:
#
bind 192.168.75.143 #修改成机器IP
#
# protected-mode no
# port <sentinel-port>
# The port that this sentinel instance will run on
port 5000
# sentinel announce-ip <ip>
# sentinel announce-port <port>
#
# The above two configuration directives are useful in environments where,
# because of NAT, Sentinel is reachable from outside via a non-local address.
#
# When announce-ip is provided, the Sentinel will claim the specified IP address
# in HELLO messages used to gossip its presence, instead of auto-detecting the
# local address as it usually does.
#
# Similarly when announce-port is provided and is valid and non-zero, Sentinel
# will announce the specified TCP port.
#
# The two options don't need to be used together, if only announce-ip is
# provided, the Sentinel will announce the specified IP and the server port
# as specified by the "port" option. If only announce-port is provided, the
# Sentinel will announce the auto-detected local IP and the specified port.
#
# Example:
#
# sentinel announce-ip 1.2.3.4
# dir <working-directory>
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir /var/sentinel/5000 #需要修改部分
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Slaves are auto-discovered, so you don't need to specify slaves in
# any way. Sentinel itself will rewrite this configuration file adding
# the slaves using additional configuration options.
# Also note that the configuration file is rewritten when a
# slave is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor mymaster 192.168.75.143 6379 2 #机器IP为需要修改部分
# sentinel auth-pass <master-name> <password>
#
# Set the password to use to authenticate with the master and slaves.
# Useful if there is a password set in the Redis instances to monitor.
#
# Note that the master password is also used for slaves, so it is not
# possible to set a different password in masters and slaves instances
# if you want to be able to monitor these instances with Sentinel.
#
# However you can have Redis instances without the authentication enabled
# mixed with Redis instances requiring the authentication (as long as the
# password set is the same for all the instances requiring the password) as
# the AUTH command will have no effect in Redis instances with authentication
# switched off.
#
# Example:
#
# sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
# sentinel down-after-milliseconds <master-name> <milliseconds>
#
# Number of milliseconds the master (or any attached slave or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
sentinel down-after-milliseconds mymaster 30000
# sentinel parallel-syncs <master-name> <numslaves>
#
# How many slaves we can reconfigure to point to the new slave simultaneously
# during the failover. Use a low number if you use the slaves to serve query
# to avoid that all the slaves will be unreachable at about the same
# time while performing the synchronization with the master.
sentinel parallel-syncs mymaster 1
# sentinel failover-timeout <master-name> <milliseconds>
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
# reconfigured as slaves of the new master. However even after this time
# the slaves will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
sentinel failover-timeout mymaster 180000
# SCRIPTS EXECUTION
#
# sentinel notification-script and sentinel reconfig-script are used in order
# to configure scripts that are called to notify the system administrator
# or to reconfigure clients after a failover. The scripts are executed
# with the following rules for error handling:
#
# If script exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If script exits with "2" (or an higher value) the script execution is
# not retried.
#
# If script terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A script has a maximum running time of 60 seconds. After this limit is
# reached the script is terminated with a SIGKILL and the execution retried.
# NOTIFICATION SCRIPT
#
# sentinel notification-script <master-name> <script-path>
#
# Call the specified notification script for any sentinel event that is
# generated in the WARNING level (for instance -sdown, -odown, and so forth).
# This script should notify the system administrator via email, SMS, or any
# other messaging system, that there is something wrong with the monitored
# Redis systems.
#
# The script is called with just two arguments: the first is the event type
# and the second the event description.
#
# The script must exist and be executable in order for sentinel to start if
# this option is provided.
#
# Example:
#
# sentinel notification-script mymaster /var/redis/notify.sh
# CLIENTS RECONFIGURATION SCRIPT
#
# sentinel client-reconfig-script <master-name> <script-path>
#
# When the master changed because of a failover a script can be called in
# order to perform application-specific tasks to notify the clients that the
# configuration has changed and the master is at a different address.
#
# The following arguments are passed to the script:
#
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
#
# <state> is currently always "failover"
# <role> is either "leader" or "observer"
#
# The arguments from-ip, from-port, to-ip, to-port are used to communicate
# the old address of the master and the new address of the elected slave
# (now a master).
#
# This script should be resistant to multiple invocations.
#
# Example:
#
# sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
2.2 启动哨兵进程
在配置好的各台机器上分别启动哨兵进程,组成一个集群
如果开启安全认证的话,执行命令不成功,所以就把安全认证关闭掉了
关闭方法
redis-sentinel /etc/sentinel/5000.conf
redis-server /etc/sentinel/5000.conf --sentinel
日志里会显示出来,每个哨兵都能去监控到对应的redis master,并能够自动发现对应的slave
哨兵之间,互相会自动进行发现,用的就是之前说的pub/sub,消息发布和订阅channel消息系统和机制
2.3 检查哨兵状态
redis-cli -h 192.168.75.143 -p 5000
命令
sentinel master mymaster
sentinel slaves mymaster
sentinel sentinels mymaster
sentinel get-master-addr-by-name mymaster
-
sentinel master mymaster
-
sentinel slaves mymaster
-
sentinel sentinels mymaster
-
sentinel get-master-addr-by-name mymaster
3. 哨兵节点管理
3.1 哨兵节点增加与删除
-
增加sentinel,会自动发现
-
删除sentinel的步骤
- 停止sentinel进程
- sentinel reset*,在所有sentinel上执行,清理所有的master状态
- sentinel master mastername,在所有sentinel上执行,查看所有sentinel对数量是否达成了一致
3.2 slave的永久下线
让master去除某个已经下线的slave:sentinel reset mastername
,在所有的哨兵上面执行
3.3 slave切换为Master的优先级
slave->master选举优先级:slave-priority
,值越小优先级越高
3.4 基于哨兵集群架构下的安全认证
每个slave都有可能切换成master,所以每个实例都要配置两个指令
master上启用安全认证:requirepass
master连接口令:masterauth
sentinel:sentinel auth-pass <master-group-name> <pass>
3.5 故障恢复
将旧的master重新启动,查看是否被哨兵自动切换成slave节点
(1)手动杀掉master
(2)哨兵能否执行主备切换,将slave切换为master
(3)哨兵完成主备切换后,新的master能否使用
(4)故障恢复,将旧的master重新启动
(5)哨兵能否自动将旧的master变为slave,挂接到新的master上面去,而且也是可以使用的
4. 哨兵的生产环境部署
配置后台启动
mkdir -p /var/log/sentinel/5000
编辑/etc/sentinel/5000.conf
daemonize yes
logfile /var/log/sentinel/5000/sentinel.log