Redis sentry mechanism-knowledge points you have to understand

abstract

  • Understand the principle of Redis sentry
  • Able to configure Redis sentry

One, Redis sentry

Disadvantages of Redis master-slave replication : There is no way to dynamically elect the master (after the master goes down, the master needs to be re-elected), and the Sentinel mechanism needs to be used to complete the dynamic election.

What is the Redis sentry mechanism?

Sentinel is a distributed system. You can run multiple sentinel processes in an architecture. These processes use gossip protocols to receive information about whether the Master is offline, and use agreement protocols. ) To decide whether to perform automatic fault migration, and which Slave to choose as the new Master (raft algorithm) .

Redis's sentinel mode is only available after version 2.8.
The Sentinel (Sentinel) process is used to monitor the status of Master Redis cluster master work ;
the Master when the primary server fails, the handover can be realized Master and Slave servers, to ensure high availability of the system (the HA);

Redis's sentinel system is used to manage multiple Redis servers. The system performs the following three tasks:

  • Monitoring : The sentinel will constantly check whether your Master and Slave are operating normally.

  • Notification : When a monitored Redis node has a problem, sentinel can send notifications to administrators or other applications through the API.

  • Automatic failover (Automatic failover) : When a Master fails to work normally, the sentinel will start an automatic failover operation.

Each sentinel will periodically send messages to other sentinels, masters, and slaves to confirm whether the other party is "alive". If it is found that the other party has not responded within the specified time (configurable), it is temporarily considered that the other party is dead (so-called " subjective view down " subjective Down, referred sdown).
If the "sentinel group" in the most sentinel, reported a master did not respond, the system was considered the master " completely dead " (ie: the real objective Down machine, Objective Down, referred to as odown), through a certain vote algorithm (raft algorithm), from the remaining slave nodes, select one to promote to the master, and then automatically modify the relevant configuration.

Although sentinel is released as a separate executable file redis-sentinel, it is actually just a Redis server running in a special mode. You can start a normal Redis server by specifying the --sentinel option. Activate the sentinel.

Some design ideas of sentinel are very similar to zookeeper.

Insert picture description here

Sentinel mode configuration

Implementation steps:

  1. Copy to etc directory
    cp sentinel.conf /usr/local/redis/etc

  2. Modify sentinel.conf configuration file

# 哨兵sentinel监控的redis主节点的 ip port
# master-name 可以自己命名的主节点名字 只能由字母A-z、数字0-9 、这三个字符".-_"组成。
# quorum 当这些quorum个数sentinel哨兵认为master主节点失联 那么这时 客观上认为主节点失联了
# sentinel monitor <master-name> <master ip> <master port> <quorum>
sentinel monitor mymaster 192.168.137.6 6379 1
#后台运行
daemonize yes

  1. Description of other configuration items

    sentinel.conf


# 哨兵sentinel实例运行的端口 默认26379
port 26379
# 哨兵sentinel的工作目录
dir /tmp
# 哨兵sentinel监控的redis主节点的 ip port
# master-name 可以自己命名的主节点名字 只能由字母A-z、数字0-9 、这三个字符".-_"组成。
# quorum 当这些quorum个数sentinel哨兵认为master主节点失联 那么这时 客观上认为主节点失联了
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel monitor mymaster 127.0.0.1 6379 1
# 当在Redis实例中开启了requirepass foobared 授权密码 这样所有连接Redis实例的客户端都要提供密码
# 设置哨兵sentinel 连接主从的密码 注意必须为主从设置一样的验证密码,无密码可忽略此配置
# sentinel auth-pass <master-name> <password>
sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
# 指定多少毫秒之后 主节点没有应答哨兵sentinel  此时,哨兵主观上认为主节点下线 默认30秒
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000
# 这个配置项指定了在发生failover主备切换时最多可以有多少个slave同时对新的master进行 同步,这个数字越小,完成failover所需的时间就越长,但是如果这个数字越大,就意味着越 多的slave因为replication而不可用。可以通过将这个值设为 1 来保证每次只有一个slave 处于不能处理命令请求的状态。
# sentinel parallel-syncs <master-name> <numslaves>
sentinel parallel-syncs mymaster 1
# 故障转移的超时时间 failover-timeout 可以用在以下这些方面:
#1. 同一个sentinel对同一个master两次failover之间的间隔时间。
#2. 当一个slave从一个错误的master那里同步数据开始计算时间。直到slave被纠正为向正确的master那里同步数据时。
#3.当想要取消一个正在进行的failover所需要的时间。
#4.当进行failover时,配置所有slaves指向新的master所需的最大时间。不过,即使过了这个超时,slaves依然会被正确配置为指向master,但是就不按parallel-syncs所配置的规则来了
# 默认三分钟
# sentinel failover-timeout <master-name> <milliseconds>
sentinel failover-timeout mymaster 180000
# SCRIPTS EXECUTION
#配置当某一事件发生时所需要执行的脚本,可以通过脚本来通知管理员,例如当系统运行不正常时发邮件通知相关人员。
#对于脚本的运行结果有以下规则:
#若脚本执行后返回1,那么该脚本稍后将会被再次执行,重复次数目前默认为10
#若脚本执行后返回2,或者比2更高的一个返回值,脚本将不会重复执行。
#如果脚本在执行过程中由于收到系统中断信号被终止了,则同返回值为1时的行为相同。
#一个脚本的最大执行时间为60s,如果超过这个时间,脚本将会被一个SIGKILL信号终止,之后重新执行。
#通知型脚本:当sentinel有任何警告级别的事件发生时(比如说redis实例的主观失效和客观失效等等),将会去调用这个脚本,这时这个脚本应该通过邮件,SMS等方式去通知系统管理员关于系统不正常运行的信息。调用该脚本时,将传给脚本两个参数,一个是事件的类型,一个是事件的描述。
#如果sentinel.conf配置文件中配置了这个脚本路径,那么必须保证这个脚本存在于这个路径,并且是可执行的,否则sentinel无法正常启动成功。
#通知脚本
# sentinel notification-script <master-name> <script-path>
sentinel notification-script mymaster /var/redis/notify.sh
# 客户端重新配置主节点参数脚本
# 当一个master由于failover而发生改变时,这个脚本将会被调用,通知相关的客户端关于master地址已经发生改变的信息。
# 以下参数将会在调用脚本时传给脚本:
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
# 目前<state>总是“failover”,
# <role>是“leader”或者“observer”中的一个。
# 参数 from-ip, from-port, to-ip, to-port是用来和旧的master和新的master(即旧的slave)通信的
# 这个脚本应该是通用的,能被多次调用,不是针对性的。
# sentinel client-reconfig-script <master-name> <script-path>
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
  1. Start the sentinel service through redis-sentinel

    ./redis-sentinel sentinel.conf

note:

  1. When the sentry mode is activated, if your master server goes down, the sentry will automatically vote for a master master server from the redis server; this master server can also perform read and write operations!
  2. If the main server that went down before has been repaired, it can be officially operated. Then this server can only perform read operations, and will automatically follow the new server elected by the sentry!
  3. You can enter ./redis-cli, enter info replication, and view your status information;

Sentinel summary

1. The role of Sentinel

  1. Master status monitoring
  2. If the Master is abnormal, the master-slave conversion will be performed, and one of the slaves will be used as the master and the previous master will be used as the slave
  3. After Master-Slave switch, the contents of master_redis.conf, slave_redis.conf and sentinel.conf will be changed, that is, there will be an extra line of slaveof configuration in master_redis.conf, and the monitoring target of sentinel.conf will be changed accordingly.

2. How Sentinel works:

  1. Each Sentinel sends a PING command to the Master, Slave, and other Sentinel instances it knows once a second.
  2. If an instance (instance) the last valid reply from the PING command time exceeds the down-after-millisecondsvalue of the option specified, then this instance is marked as Sentinel subjective offline .
  3. If a Master is marked as subjectively offline, all Sentinels that are monitoring this Master must confirm that the Master has indeed entered the subjective offline state at a frequency of once per second.
  4. Sentinel when a sufficient number of (not less than the value specified profiles) Master does confirm the subjective offline state entered, the Master will be marked within a specified time objective offline .
  5. In general, each Sentinel will send an INFO command to all its known Masters and Slaves every 10 seconds.
  6. When the Master is marked as objectively offline by Sentinel, the frequency of sending INFO commands to all slaves of the Master that Sentinel goes offline will be changed from once every 10 seconds to once every second.
  7. If there is not enough Sentinel to agree that the Master has gone offline, the objective offline status of the Master will be removed. If the Master returns a valid response to Sentinel's PING command again, the Master's subjective offline status will be removed.

Heartbeat detection

In the command propagation phase, the slave server sends commands to the master server at a frequency of once per second by default:

REPLCONF ACK <replication_offset> //replication_offset is the current replication offset from the server.

The role of heartbeat detection:

  • Check the network connection status of the main server;
  • Assist the realization of min-slaves option;
  • The detection command is lost.
  1. Check the network connection status of the master and slave server

By sending the primary server INFO replicationcommand, you can list from the server list, it can be seen from the master to send commands to last from now, the number of seconds.

localhost:6377> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6379,state=online,offset=110180,lag=0 # 刚刚发送过REPLCONF ACK 命令
slave1:ip=127.0.0.1,port=6378,state=online,offset=110180,lag=1 # 1秒之前发送过REPLCONF ACK 命令
master_replid:55c2177dd69fc21dbea4e9f8a3f4fb0ee948855d
master_replid2:a80967516d1b0821c315fd2eb550f2ff0597010c
master_repl_offset:110313
second_repl_offset:25348
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11612
repl_backlog_histlen:98702

The value of lag should jump between 0 or 1. If it exceeds 1, the connection between master and slave is faulty.

  1. Assist in implementing min-slaves option

Redis can prevent the main server from executing write commands in an insecure situation through configuration;

min-slaves-to-write 3
min-slaves-max-lag 10

The above configuration means: when the number of slave servers is less than 3, or the lag value of the three slave servers is greater than or equal to 10 seconds, the master server will refuse to execute the write command. The delay value here is the lag value of the INFO replication command above.

  1. Detect command loss

If the network fails, the primary server to the spread of the loss from the write command server is down, and then when sent from the server to the main server REPLCONF ACK <replication_offset>command, the master copy will be found from the current server is less than offset their own copy offset, Then the master server will find the missing data from the slave server in the replication backlog buffer according to the replication offset submitted by the slave server, and resend the data to the slave server.

The principles of operation of this master server and from the server to the replacement of missing data resynchronization operation portion principle very similar, their differences are: replacement of missing data from the master operation is performed in a case where the server is not broken, and partly resynchronization The operation is performed after the master and slave servers are disconnected and reconnected.

If, for part of the resynchronization operation students do not understand, you can refer to the author wrote Redis- master-slave replication principle of this article.

Guess you like

Origin blog.csdn.net/taurus_7c/article/details/104351665