Redis Sentinel-explain the principle and actual combat

This blog will briefly introduce the principles related to Redis Sentinel, and will also give a hardcore  practical tutorial in the final article  , so that you can actually experience the entire process after understanding the principles.

The previous article talked about Redis master-slave replication, and talked about its related principles and shortcomings. For specific suggestions, please see the article I wrote earlier on Redis master-slave replication.

In general, in order to meet the high availability of Redis in a truly complex production environment, it is obviously not enough to use master-slave replication. For example, when the master node is down, when the master-slave switch is performed, we need to manually failover.

At the same time, in terms of traffic, the master-slave architecture can only expand read requests by adding slave nodes, and the  write capacity  cannot be expanded due to the resource limitation of the master single node.

This is why we need to introduce Sentinel.

Sentinel

Function overview

The general function of Sentinel is shown in the figure below.

Sentinel is one of Redis' high-availability solutions. It is also a distributed architecture, including  multiple  Sentinel nodes and  multiple  Redis nodes. Each Sentinel node monitors the Redis node and the rest of the Sentinel nodes.

When it finds that a node is unreachable, if it is the master node, it will negotiate with the rest of the Sentinel nodes. When most Sentinel nodes think that the master is unreachable, a Sentinel node will be selected to perform a failover on the master and notify the caller of Redis of the relevant changes.

Compared with   manual failover under master-slave , Sentinel's failover is fully automatic  without  manual intervention.

Sentinel itself is highly available

666, how do I know how many Sentinel nodes need to be deployed to meet its own high availability?

Because Sentinel itself is distributed, so the need to deploy multiple instances to ensure high availability cluster itself, but this number is a minimum requirement, the minimum required  three  .

I'll go, you say 3 is 3? I just deployed only 2 today

Don't be arrogant...Why do you need 3 after I tell you...

Because the Sentinel performs failover,  most  of the Sentinel must agree to it. If there are only two Sentinel instances, normal operation is fine, just like this.

If the machine where the sentry is located fails due to the power failure in the computer room, the fiber is dug, and other extreme situations, the other sentry wants to perform failover even after discovering the master failure, but it cannot get any  other sentry nodes.  Agrees, and  failover can never be performed at this time.  Is n't Sentinel a display?

So we need at least 3 nodes to ensure the high availability of the Sentinel cluster itself. Of course, these three Sentinel nodes are definitely recommended to be deployed on  different  machines. If all Sentinel nodes are deployed on the same machine, then when this machine is hung up, the entire Sentinel will no longer exist.

quorum&majority

most? Brother, this is a production environment. Most of this quantity is too perfunctory, so we can't be more professional?

Most of the sentinels mentioned above agree to involve two parameters, one is called quorum. If there are quorum sentinels in the Sentinel cluster that think the master is down, it will  objectively  think that the master is down. The other is called majority…

Wait, wait, isn't there already a quorum? Why do we need this majority?

Can you wait for me to finish...

Quorum just talked about, its function is to judge whether the master is in a down state, it is only a  judgment  function. In actual production  , we do n't mean that we only  judge that the master is down. Don't we have to perform  failover to  make the cluster work normally?

In the same way, when the sentinel cluster starts failover, if majority of sentinels agree to failover, then a sentinel node can be finally selected to perform the failover operation.

Subjective downtime & objective downtime

Did you just mention  objective downtime  ? Laughing to death, is it possible that there is still subjective downtime?

Sentinel considers that there are two types of node failure:

  • Subjective Down, referred to as  sdown  , subjectively believes that the master is down
  • Objective Down, referred to as  odown  , objectively think that the master is down

When a Sentinel node communicates with the Redis node A it monitors and finds that it cannot be connected, the sentinel node will  subjectively  think that the Redis data A node is down. Why is it  subjective  ? We must first know what is subjective

Without analysis and calculation, conclusions, decisions, and behavioral reactions cannot be carefully discussed with other objects of different views, which is called subjective  .

In short, because there may be  only  network traffic current Sentinel node A and the node in question, the rest of the Sentinel node can still be normal communication and A.

This is why we need to introduce  odown  . When more than or equal to  quorum  Sentinel nodes think that a node is down, we  objectively  think that this node is down.

When the Sentinel cluster objectively thinks that the master is down, it will select a Sentinel node from all the Sentinel nodes to finally perform the failover of the master.

So  what exactly does this  failover perform? Let's look at it through a picture.

Notify the calling client that the master has changed

Notify the remaining original slave nodes to copy the new master node elected by Sentinel

If the original master is restored at this time, Sentinel will also let it replicate the new master node. Become a new slave node.

Hardcore tutorial

The hard core tutorial aims to use the fastest method to let you experience the Redis master-slave architecture and the establishment of Sentinel cluster locally, and experience the entire failover process.

Prerequisites

  1. Docker installed
  2. Docker-compose installed

Prepare compose file

First, you need to prepare a directory, and then create two subdirectories. as follows.

$ tree .
.
├── redis
│   └── docker-compose.yml
└── sentinel
    ├── docker-compose.yml
    ├── sentinel1.conf
    ├── sentinel2.conf
    └── sentinel3.conf

2 directories, 5 files

Build Redis master-slave server

The content of docker-compose.yml in the redis directory is as follows.

version: '3'
services:
  master:
    image: redis
    container_name: redis-master
    ports:
      - 6380:6379
  slave1:
    image: redis
    container_name: redis-slave-1
    ports:
      - 6381:6379
    command:  redis-server --slaveof redis-master 6379
  slave2:
    image: redis
    container_name: redis-slave-2
    ports:
      - 6382:6379
    command: redis-server --slaveof redis-master 6379

The above command, briefly explain slaveof

It is to let two slave nodes replicate the node whose container_name is redis-master, thus forming a simple three-node master-slave architecture

Then use the command line to enter the current directory, directly type the command docker-compose up, and leave the rest to docker-compose to do it. It will start all the nodes we need.

At this point we also need to get the IP of the master node we just started. The brief steps are as follows:

  1. 通过 docker ps 找到对应的master节点的containerID$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9f682c199e9b redis"docker-entrypoint.s…" 3 seconds ago Up 2 seconds 0.0.0.0:6381->6379/tcp redis-slave-1 2572ab587558 redis “docker-entrypoint.s…” 3 seconds ago Up 2seconds 0.0.0.0:6382->6379/tcp redis-slave-2 f70a9d9809bc redis “docker-entrypoint.s…” 3 seconds ago Up 2 seconds 0.0.0.0:6380->6379/tcp redis-master也就是 f70a9d9809bc 。
  2. Get the IP of the corresponding container through docker inspect f70a9d9809bc, in the NetworkSettings -> Networks -> IPAddress field.

Then record this value, here my value is 172.28.0.3.

Set up a Sentinel cluster

The content of docker-compose.yml in the sentinel directory is as follows.

version: '3'
services:
  sentinel1:
    image: redis
    container_name: redis-sentinel-1
    ports:
      - 26379:26379
    command: redis-sentinel /usr/local/etc/redis/sentinel.conf
    volumes:
      - ./sentinel1.conf:/usr/local/etc/redis/sentinel.conf
  sentinel2:
    image: redis
    container_name: redis-sentinel-2
    ports:
    - 26380:26379
    command: redis-sentinel /usr/local/etc/redis/sentinel.conf
    volumes:
      - ./sentinel2.conf:/usr/local/etc/redis/sentinel.conf
  sentinel3:
    image: redis
    container_name: redis-sentinel-3
    ports:
      - 26381:26379
    command: redis-sentinel /usr/local/etc/redis/sentinel.conf
    volumes:
      - ./sentinel3.conf:/usr/local/etc/redis/sentinel.conf
networks:
  default:
    external:
      name: redis_default

Also explain the command here

The redis-sentinel command makes redis start in sentinel mode, which is essentially a redis server running in a special mode.

The difference with redis-server is that they load different command tables, and sentinel cannot perform various set get operations unique to redis.

Create three identical files, named sentinel1.conf, sentinel2.conf, and sentinel3.conf. Its content is as follows:

port 26379
dir "/tmp"
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 172.28.0.3 6379 2
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1

As you can see, in our sentinel configuration file, sentinel monitor mymaster 172.28.0.3 6379 2 means let it monitor the master node named mymaster. Note that the IP here must be the IP of your own master node, and then the last 2 is the quorum we mentioned earlier.

Then the command line enters the directory named sentinel and hits docker-compose up. At this point, the Sentinel cluster is started.

Manually simulate the master hanging

Then we need to manually simulate the master hanging up to verify whether the Sentinel cluster we built can perform failover normally.

Enter the directory named redis from the command line and type the following commands.

docker-compose pause master

At this time, the master container will be suspended.  After waiting for  10 seconds , you can see the following log output from sentinel.

redis-sentinel-2 | 1:X 07 Dec 2020 01:58:05.459 # +sdown master mymaster 172.28.0.3 6379
......
......
......
redis-sentinel-1 | 1:X 07 Dec 2020 01:58:06.932 # +switch-master mymaster 172.28.0.3 6379 172.28.0.2 6379

Come on, what are you doing just dumping a bunch of log files? Make up the word count? Can you understand this ghost?

Indeed, just looking at the log file line by line, even if I look at it again in two weeks, I still look confused. The log file completely describes all the details of the entire Sentinel cluster from the beginning of the failover to the final execution, but it is not convenient for everyone to understand it directly here.

So in order for everyone to understand this process more intuitively, I simply abstracted the process into a picture. It should be easier to understand if you look at the picture and the log.

The relevant explanations of the key steps are also included in the picture.

The final result is that the master has been switched from our initial 172.28.0.3 to 172.28.0.2, which is one of the original slave nodes. At this time, we can also connect to the container 172.28.0.2, and look at its current situation through commands.

role:master
connected_slaves:1
slave0:ip=172.28.0.4,port=6379,state=online,offset=18952,lag=0
master_replid:f0bf5d1c843ec3ab005c5ac2b864f7ffdc6a8217
master_replid2:72c43e1f9c05d4b08bea6bf9b2549997587e261c
master_repl_offset:18952
second_repl_offset:16351
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:18952

It can be seen that the role of the current node 172.28.0.2 has become the  master  , and there is only one slave node connected to it, because the  original master  has not been started yet, and there are only 2 instances in total.

Restart the original master

Next, we simulate restarting the original master to see what will be sent.

Still enter the local directory named redis through the command line, and use docker-compose unpause master to simulate the going online after the original master fails to recover. Similarly, we connect to the original master machine.

$ docker exec -it f70a9d9809bc1e924a5be0135888067ad3eb16552f9eaf82495e4c956b456cd9 /bin/sh; exit
# redis-cli
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.28.0.2
master_port:6379
master_link_status:up
......

After the master is disconnected and reconnected, the role becomes a slave of the new master (that is, the node 172.28.0.2).

Then we can also testify by looking at the replication situation of the new master node.

# Replication
role:master
connected_slaves:2
slave0:ip=172.28.0.4,port=6379,state=online,offset=179800,lag=0
slave1:ip=172.28.0.3,port=6379,state=online,offset=179800,lag=1
......

After the original master was reconnected in a short-term, its  connected_slaves  became 2, and the  original master  172.28.0.3 was clearly marked as slave1, which is also in line with the principles in our opening chapter and the figure.

Guess you like

Origin blog.csdn.net/doubututou/article/details/110928871