1. Redis High Availability

1.1 What is High Availability

In web servers, high availability refers to the time when the server can be accessed normally, and the measure is how long normal services can be provided (99.9%, 99.99%, 99.999%, etc.).

The calculation formula for high availability is 1-(downtime)/(downtime+running time), which is somewhat similar to the bit error rate of network transmission parameters. We use the number of 9 to represent availability:

2 nines: 99%, downtime within a year: 1% x 365 days = 3.6524 days = 87.6h

4 nines: 99.99%, downtime within one year: 0.01%×365 days=52.56min

5 nines: 99.999%, downtime within one year: 0.001%*365 days=5.265min

Eleven nines: almost a year of downtime in seconds

However, in the context of Redis, the meaning of high availability seems to be broader. In addition to ensuring the provision of normal services (such as master-slave separation, rapid disaster recovery technology), it is also necessary to consider the expansion of data capacity and data security without loss.

1.2 High availability technology of Redis

In Redis, technologies to achieve high availability mainly include persistence, master-slave replication, sentry and cluster clusters. The following describes their functions and what problems they solve.

Persistence: Persistence is the simplest high-availability method (sometimes not even classified as a high-availability method). Its main function is data backup, that is, storing data on the hard disk to ensure that data will not be lost due to process exit.
Master-slave replication: Master-slave replication is the basis of highly available Redis. Sentinels and clusters are based on master-slave replication to achieve high availability. Master-slave replication mainly realizes multi-machine backup (and synchronization) of data, as well as load balancing and simple fault recovery for read operations.
- Defects: Failure recovery cannot be automated; write operations cannot be load-balanced; storage capacity is limited by a single machine.
Sentinel: Based on master-slave replication, Sentinel implements automatic failure recovery. (The master is down, find a slave to become the new master, and the sentinel node will monitor it)
- Defects: Write operations cannot be load-balanced; storage capacity is limited by a single machine.
Cluster cluster: Through the cluster, Redis solves the problem that the write operation cannot be load-balanced and the storage capacity is limited by a single machine, and realizes a relatively complete high-availability solution. (6 sets start, in pairs, 3 masters and 3 slaves)

2. Redis master-slave replication

Master-slave replication refers to copying the data of one Redis server to other Redis servers. The former is called the master node (Master), and the latter is called the slave node (slave); data replication is one-way, only from the master node to the slave node.

By default, each Redis server is a master node; and a master node can have multiple slave nodes (or no slave nodes), but a slave node can only have one master node.

1.1 The role of master-slave replication

Data redundancy: master-slave replication implements hot backup of data, which is a data redundancy method other than persistence.
Fault recovery: When there is a problem with the master node, the slave node can provide services to achieve rapid fault recovery; it is actually a kind of service redundancy.
Load balancing: On the basis of master-slave replication, combined with read-write separation, the master node can provide write services, and the slave nodes can provide read services (that is, the application connects to the master node when writing Redis data, and the application connects to the slave node when reading Redis data) , to share the server load; especially in the scenario of writing less and reading more, sharing the read load through multiple slave nodes can greatly increase the concurrency of the Redis server.
The cornerstone of high availability: In addition to the above functions, master-slave replication is also the basis for the implementation of sentinels and clusters, so master-slave replication is the basis for high availability of Redis.

1.2 Master-slave replication process

(1) If a slave machine process is started, it will send a sync command command to the Master machine to request a synchronous connection.

(2) Whether it is the first connection or reconnection, the Master machine will start a background process to save the data snapshot to the data file (execute rdb operation), and the Master will also record all the commands to modify the data and cache them in the data file middle.

(3) After the background process completes the cache operation, the Master machine will send the data file to the slave machine, the slave machine will save the data file to the hard disk, and then load it into the memory, and then the Master machine will modify all the data files The operation is also sent to the slave machine. If the slave fails and causes a downtime, it will automatically reconnect after returning to normal.

(4) After the master machine receives the connection from the slave machine, it sends its complete data file to the slave machine. If the master receives synchronization requests from multiple slaves at the same time, the master will start a process in the background to save Data file, and then send it to all slave machines, make sure all slave machines are normal.

3. Build Redis master-slave replication

3.1 Install Redis on all nodes

#关闭防火墙
systemctl stop firewalld
setenforce 0
#安装环境依赖包，下载编译工具
yum install -y gcc gcc-c++ make

#上传软件包并解压
cd /opt/
tar zxvf redis-5.0.7.tar.gz -C /opt/
cd /opt/redis-5.0.7/
#开2核编译安装，指定安装路径为/usr/local/redis
make -j2 && make PREFIX=/usr/local/redis install
#由于Redis源码包中直接提供了Makefile 文件，所以在解压完软件包后，不用先执行./configure 进行配置，可直接执行make与make install命令进行安装。

#执行软件包提供的install_server.sh 脚本文件，设置Redis服务所需要的相关配置文件
cd /opt/redis-5.0.7/utils
./install_server.sh
.......#一直回车
Please select the redis executable path [] /usr/local/redis/bin/redis-server
#这里默认为/usr/local/bin/redis-server，需要手动修改为/usr/local/redis/bin/redis-server，注意要一次性正确输入

 ---------------------- 虚线内是注释 ----------------------------------------------------
 Selected config:
 Port: 6379                                      #默认侦听端口为6379
 Config file: /etc/redis/6379.conf               #配置文件路径
 Log file: /var/log/redis_6379.log               #日志文件路径
 Data dir : /var/lib/redis/6379                  #数据文件路径
 Executable: /usr/local/redis/bin/redis-server   #可执行文件路径
 Cli Executable : /usr/local/bin/redis-cli       #客户端命令工具
 -----------------------------------------------------------------------------------

#当install_server.sh 脚本运行完毕，Redis 服务就已经启动，默认监听端口为6379
netstat -natp | grep redis

#把redis的可执行程序文件放入路径环境变量的目录中，便于系统识别
ln -s /usr/local/redis/bin/* /usr/local/bin/

#Redis服务控制
/etc/init.d/redis_6379 stop     #停止
/etc/init.d/redis_6379 start    #启动
/etc/init.d/redis_6379 restart  #重启
/etc/init.d/redis_6379 status   #查看状态

3.2 Modify the configuration file of the master node

vim /etc/redis/6379.conf 
 bind 0.0.0.0                      #70行,修改监听地址为0.0.0.0（生产环境中需要填写物理网卡的IP）
 daemonize yes                     #137行，开启守护进程，后台启动 
 logfile /var/log/redis_6379.log   #172行，指定日志文件存放目录
 dir /var/lib/redis/6379           #264行，指定工作目录
 appendonly yes                    #700行，开启AOF持久化功能

/etc/init.d/redis_6379 restart     #重启redis服务

.3 Modify the configuration file of the slave node

Modify the configuration file of slave1, then scp it to slave2

#修改slave1的配置文件
vim /etc/redis/6379.conf 
 bind 0.0.0.0                        #70行,修改监听地址为0.0.0.0（生产环境中需要填写物理网卡的IP）
 daemonize yes                       #137行,开启守护进程，后台启动
 logfile /var/log/redis_6379.log     #172行,指定日志文件目录
 dir /var/lib/redis/6379             #264行,指定工作目录
 replicaof 192.168.121.10 6379       #288行,指定要同步的Master节点的IP和端口
 appendonly yes                      #700行,修改为yes，开启AOF持久化功能

#将配置文件传给slave2
scp /etc/redis/6379.conf 192.168.121.30:/etc/redis/

/etc/init.d/redis_6379 restart  #重启redis
netstat -natp | grep redis      #查看主从服务器是否已建立连接

4. Redis sentinel mode

The method of master-slave switching technology is: when the server is down, it is necessary to manually switch a slave machine to the master machine, which requires manual intervention, which is not only time-consuming and laborious, but also causes the service to be unavailable for a period of time. In order to solve the shortcomings of master-slave replication, there is a sentinel mechanism.

Sentinel's core function: Based on master-slave replication, Sentinel introduces automatic failover of the master node.

2.1 The role of sentinel mode

Monitoring: Sentry constantly checks that the master and slave nodes are functioning properly.
Automatic failover: When the master node fails to work normally, Sentinel will start an automatic failover operation. It will upgrade one of the slave nodes of the failed master node to a new master node, and let other slave nodes copy the new master node instead.
Notification (reminder): Sentinel can send the result of failover to the client.

2.2 Sentinel structure

Sentinel node: The sentinel system consists of one or more sentinel nodes, which are special redis nodes that do not store data .

Data Nodes: Both master and slave nodes are data nodes.

2.3 Failover mechanism

1. The sentinel node regularly monitors to find out whether the master node is faulty

Each sentinel node will ask the master node, slave node and other sentinel nodes to send a ping command every 1 second for a heart check. If the master node does not reply within a certain time frame or replies with an error message, then the sentinel will consider the master node to be offline subjectively (unilaterally). When more than half of the sentinel nodes think that the master node is offline subjectively, it is objectively offline.

2. When the master node fails, the sentinel node will implement the election mechanism through the Raft algorithm (election algorithm) to jointly elect a sentinel node as the leader to be responsible for handling the failover and notification of the master node. So the number of clusters running Sentinels must not be less than 3 nodes.

3. The leader sentinel node performs failover, the process is as follows:

Upgrade a slave node to a new master node, and let other slave nodes point to the new master node;
If the original master node recovers, it becomes a slave node and points to the new master node;
Notify the client that the primary node has been replaced.

It is important to note that objective offline is a concept unique to the master node; if the slave node and sentinel node fail, after being subjectively offline by the sentinel, there will be no subsequent objective offline and failover operations

2.4 Election of the master node

1. Filter out unhealthy (offline) slave nodes that have not responded to the sentinel ping response.

2. Select the slave node with the highest priority configuration in the configuration file. (replica-priority, default value is 100)

3. Select the slave node with the largest replication offset, that is, the most complete replication.

The start of Sentinel depends on the master-slave mode, so the Sentry mode must be installed after the master-slave mode is installed.

5. Build Redis sentinel mode

Modify the configuration file sentinel.conf of the sentinel node (all sentinel node operations)

vim /opt/redis-5.0.7/sentinel.conf
......
protected-mode no                #17行，取消注释，关闭保护模式
port 26379                       #21行，Redis哨兵默认的监听端口
daemonize yes                    #26行，指定sentinel为后台启动
logfile "/var/log/sentinel.log"  #36行，指定日志文件存放路径
dir "/var/lib/redis/6379"        #65行，指定数据库存放路径
sentinel monitor mymaster 192.168.121.10 6379 2  #84行，修改
#指定该哨兵节点监控192.168.121.10:6379这个主节点，该主节点的名称是mymaster。
#最后的2的含义与主节点的故障判定有关：至少需要2个哨兵节点同意，才能判定主节点故障并进行故障转移

sentinel down-after-milliseconds mymaster 3000  #113行，判定服务器down掉的时间周期，默认30000毫秒（30秒）
sentinel failover-timeout mymaster 180000  #146行，同一个sentinel对同一个master两次failover之间的间隔时间（180秒）

#传给两外2个哨兵节点
scp /opt/redis-5.0.7/sentinel.conf  192.168.121.50:/opt/redis-5.0.7/
scp /opt/redis-5.0.7/sentinel.conf  192.168.121.60:/opt/redis-5.0.7/

Start sentinel mode (all sentinel node operations)

#启动三台哨兵
cd /opt/redis-5.0.7/
redis-sentinel sentinel.conf &

6. Redis cluster mode

Cluster, namely Redis Cluster, is a distributed storage solution introduced by Redis3.0.

The cluster consists of multiple nodes (Nodes), and Redis data is distributed among these nodes. The nodes in the cluster are divided into master nodes and slave nodes: only the master node is responsible for the maintenance of read and write requests and cluster information; the slave nodes only replicate the data and status information of the master node.

6.1 The role of the cluster

(1) Data partitioning: Data partitioning (or data sharding) is the core function of the cluster.

The cluster distributes data to multiple nodes. On the one hand, it breaks through the limit of Redis single-machine memory size, and the storage capacity is greatly increased; on the other hand, each master node can provide external read and write services, which greatly improves the responsiveness of the cluster.
Redis stand-alone memory size is limited, which was mentioned in the introduction of persistence and master-slave replication; As a result, the slave node cannot provide services for a long time, and the replication buffer of the master node may overflow during the full replication phase.

(2) High availability: The cluster supports master-slave replication and automatic failover of the master node (similar to Sentinel); when any node fails, the cluster can still provide external services.

Through the cluster, Redis solves the problem that the write operation cannot be load-balanced and the storage capacity is limited by a single machine, and realizes a relatively complete high-availability solution.

6.2 Data Fragmentation of Redis Cluster

Redis Cluster introduces the concept of hash slots.

Redis Cluster has 16384 hash slots (numbered 0-16383).

Each node of the cluster is responsible for a portion of the hash slots.

After each Key passes the CRC16 check, take the remainder of 16384 to determine which hash slot to place. Through this value, find the node corresponding to the corresponding slot, and then directly and automatically jump to the corresponding node for access operations .

Take a cluster composed of 3 nodes as an example:

Node A contains hash slots from 0 to 5460

Node B contains hash slots 5461 to 10922

Node c contains hash slots 10923 to 16383

6.3 Master-slave replication model in cluster mode

There are three nodes A, B, and C in the cluster. If node B fails, the entire cluster will be unavailable due to the lack of slots in the range of 5461-10922.
Add a slave node A1, B1, and C1 to each node, and the entire cluster consists of three master nodes and three slave nodes. After node B fails, the cluster elects B1 as the master node to continue serving. When both B and B1 fail, the cluster will be unavailable.

7. Build a Redis cluster

Enable the cluster function

Modify any server configuration file, and then pass the scp command to other hosts

cd /opt/redis-5.0.7/
vim redis.conf
......
bind 192.168.121.10                       #69行，修改为监听自己的物理网卡IP
protected-mode no                         #88行，修改为no，关闭保护模式
port 6379                                 #92行，redis默认监听端口
daemonize yes                             #136行，开启守护进程，以独立进程启动
appendonly yes                            #700行，修改为yes，开启AOF持久化
cluster-enabled yes                       #832行，取消注释，开启群集功能
cluster-config-file nodes-6379.conf       #840行，取消注释，群集名称文件设置
cluster-node-timeout 15000                #846行，取消注释，群集超时时间设置


#将文件传给另外5个节点，之后每个节点要修改监听地址为自己的IP
[root@localhost redis-5.0.7]# scp redis.conf 192.168.121.20:`pwd`
[root@localhost redis-5.0.7]# scp redis.conf 192.168.121.30:`pwd`
[root@localhost redis-5.0.7]# scp redis.conf 192.168.121.40:`pwd`
[root@localhost redis-5.0.7]# scp redis.conf 192.168.121.50:`pwd`
[root@localhost redis-5.0.7]# scp redis.conf 192.168.121.60:`pwd`

7.3 All nodes start redis service

cd /opt/redis-5.0.7/
redis-server redis.conf   #启动redis节点

7.4 Start the cluster

Just start the cluster on any node.

redis-cli --cluster create 192.168.121.10:6379 192.168.121.20:6379 192.168.121.30:6379 192.168.121.40:6379 192.168.121.50:6379 192.168.121.60:6379 --cluster-replicas 1
#六个主机分为三组，三主三从，前面的做主节点后面的做从节点下免交互的时候需要输入yes才可以创建 "-replicas 1"表示每个主节点有一个从节点
#前三台为Master，后三台为Slave

Redis high availability master-slave replication, sentinel, cluster cluster