Redis high availability and optimization

Table of contents

1: Redis High Availability

Two: Redis persistence

1. Persistence function

2. Redis provides two methods for persistence

3. RDB persistence 

(1) Trigger conditions

(1.1) Manual trigger

(1.2) Automatic trigger

(1.3) Other automatic trigger mechanisms

4. Execution process

5. Load at startup

6. AOF persistence

(1) Turn on AOF

7. Execution process

(1) Instruction addition (append)

(2) File writing (write) and file synchronization (sync)

(3) File rewriting (rewrite)

(3.1) The reason why file rewriting can compress AOF files

(3.2) The trigger of file rewriting is divided into manual trigger and automatic trigger

(3.3) Process of file rewriting

8. Load at startup

9. Advantages and disadvantages of RDB and AOF

Three: Redis performance management

1. View Redis memory usage 

 2. Memory fragmentation rate

(1) How does memory fragmentation occur?

(2) Tracking the memory fragmentation rate is very important to understand the resource performance of the Redis instance

(3) Solve the problem of high fragmentation rate

3. Memory usage

4. Internal recovery key

 5. Other restrictions

Four: Redis master-slave replication

1. Introduction to Redis master-slave replication

2. The role of master-slave replication

3. Master-slave replication process

4. Build Redis master-slave replication

(1) Install Redis

(2) Modify the Redis configuration file (Master node operation)

 (3) Modify the Redis configuration file (Slave node operation)

 (4) Verify the master-slave effect

 Five: Redis sentinel mode

1. The method of master-slave switching technology

2. The principle of sentinel mode

3. The role of sentinel mode

 4. Failover mechanism

5. Build Redis sentinel mode 

(1) Environment construction

(2) Modify the configuration file of Redis sentinel mode (all node operations)

 (3) Start sentry mode

(4) View sentinel information

(5) Fault simulation

 (6) Verification result

 Six: Redis cluster mode

1. Redis cluster concept

2. The role of the cluster

(1) Data partition: Data partition (or data fragmentation) is the core function of the cluster.

(2) High availability: the cluster supports master-slave replication and automatic failover of the master node (similar to Sentinel)

3. Data fragmentation of Redis cluster

4. Build Redis cluster mode

(1) All six servers need to install the redis database

(2) Turn on the cluster function

(3) Start the redis node

 (4) Test cluster

 5. Add nodes to the Cluster cluster and dynamically expand capacity

1. Create a new master node

2. Set the master-slave node

3. Allocate slots to new nodes

4. View the cluster status

Summarize 


1: Redis High Availability

In web servers, high availability refers to the time when the server can be accessed normally, and the measure is how long normal services can be provided (99.9%, 99.99%, 99.999%, etc.).
However, in the context of Redis, the meaning of high availability seems to be broader. In addition to ensuring the provision of normal services (such as master-slave separation, rapid disaster recovery technology), it is also necessary to consider the expansion of data capacity and data security without loss.

In Redis, technologies to achieve high availability mainly include persistence, master-slave replication, sentinels, and Cluster clusters.
Persistence : Persistence is the simplest high-availability method (sometimes not even classified as a high-availability method). Its main function is data backup, that is, storing data on the hard disk to ensure that data will not be lost due to process exit.
Master-slave replication : Master-slave replication is the basis of highly available Redis. Sentinels and clusters are based on master-slave replication to achieve high availability. Master-slave replication mainly implements multi-machine backup of data, as well as load balancing for read operations and simple fault recovery. Defects: Failure recovery cannot be automated; write operations cannot be load-balanced; storage capacity is limited by a single machine.
Sentry : On the basis of master-slave replication, Sentinel realizes automatic fault recovery. Defects: Write operations cannot be load-balanced; storage capacity is limited by a single machine.
Cluster cluster : Through clustering, Redis solves the problem that write operations cannot be load-balanced, and the storage capacity is limited by a single machine, and realizes a relatively complete high-availability solution.

Two: Redis persistence

1. Persistence function

Persistence function: Redis is an in-memory database, and the data is stored in the memory. In order to avoid the permanent loss of data after the Redis process exits abnormally due to reasons such as server power failure, it is necessary to regularly save the data in Redis in some form (data or command) from the memory to the hard disk; when Redis restarts next time, use the persistent file to achieve data recovery. Additionally, the persistent files can be copied to a remote location for disaster backup purposes.

2. Redis provides two methods for persistence

RDB persistence : The principle is to save the database records of Reids in memory to disk at regular intervals.
AOF persistence: (append only file): The principle is to write the operation log of Reids to the file in an appended way, similar to the binlog of MySQL.

Because the real-time performance of AOF persistence is better, that is, less data is lost when the process exits unexpectedly, so AOF is currently the mainstream persistence method, but RDB persistence still has its place.

3. RDB persistence 

RDB persistence refers to saving the snapshot of the data in the current process in the memory to the hard disk within a specified time interval (so it is also called snapshot persistence), stored in binary compression, and the saved file suffix is ​​rdb; when Redis restarts , you can read the snapshot file to restore data.

(1) Trigger conditions

The triggering of RDB persistence is divided into manual triggering and automatic triggering.

(1.1) Manual trigger

Both the save command and the bgsave command can generate RDB files.
The save command will block the Redis server process until the RDB file is created. During the blockage of the Redis server, the server cannot process any command requests.
The bgsave command creates a child process, which is responsible for creating the RDB file, and the parent process (that is, the Redis main process) continues to process requests.

During the execution of the bgsave command, only the fork child process will block the server, but for the save command, the entire process will block the server, so save has been basically abandoned, and the use of save must be avoided in the online environment.

(1.2) Automatic trigger

When automatically triggering RDB persistence, Redis will also choose bgsave instead of save for persistence.

The most common case of automatic triggering of save mn
is to pass save mn in the configuration file to specify that when n changes occur within m seconds, bgsave will be triggered to take a snapshot.

vim /usr/local/redis/conf/redis.conf
--433行--RDB默认保存策略
# save 3600 1 300 100 60 10000
#表示以下三个save条件满足任意一个时,都会引起bgsave的调用
save 3600 1 :当时间到3600秒时,如果redis数据发生了至少1次变化,则执行bgsave
save 300 10 :当时间到300秒时,如果redis数据发生了至少10次变化,则执行bgsave
save 60 10000 :当时间到60秒时,如果redis数据发生了至少10000次变化,则执行bgsave

--454行--是否开启RDB文件压缩
rdbcompression yes
--481行--指定RDB文件名
dbfilename dump.rdb
--504行--指定RDB文件和AOF文件所在目录
dir /usr/local/redis/data

(1.3) Other automatic trigger mechanisms

In addition to save mn, there are other situations that trigger bgsave:
●In the master-slave replication scenario, if the slave node performs a full copy operation, the master node will execute the bgsave command and send the rdb file to the slave node.
● When the shutdown command is executed, RDB persistence is automatically executed.

4. Execution process

(1) The Redis parent process first judges: whether it is currently executing save, or the child process of bgsave/bgrewriteaof, if it is executing, the bgsave command will return directly. The child processes of bgsave/bgrewriteaof cannot be executed at the same time, mainly based on performance considerations: two concurrent child processes perform a large number of disk write operations at the same time, which may cause serious performance problems.
(2) The parent process executes the fork operation to create a child process. During this process, the parent process is blocked, and Redis cannot execute any commands from the client. (3) After the parent process forks, the bgsave command returns the "
Background saving started" message and no longer blocks The parent process can respond to other commands
(4) The child process creates an RDB file, generates a temporary snapshot file based on the memory snapshot of the parent process, and atomically replaces the original file after completion
(5) The child process sends a signal to the parent process to indicate completion, and the parent process Process Update Statistics

5. Load at startup

The loading of the RDB file is automatically executed when the server starts, and there is no special command. However, because AOF has a higher priority, when AOF is enabled, Redis will prioritize loading AOF files to restore data; only when AOF is disabled, will RDB files be detected and automatically loaded when the Redis server starts. The server is blocked while loading the RDB file until the loading is complete.
When Redis loads the RDB file, it will verify the RDB file. If the file is damaged, an error will be printed in the log, and Redis will fail to start.

6. AOF persistence

RDB persistence is to write process data into files, while AOF persistence is to record each write and delete command executed by Redis into a separate log file, and the query operation will not be recorded; when Redis restarts, execute the AOF file again command in to restore data.
Compared with RDB, AOF has better real-time performance, so it has become the mainstream persistence solution.

(1) Turn on AOF

Redis服务器默认开启RDB,关闭AOF;要开启AOF,需要在配置文件中配置:
vim /usr/local/redis/conf/redis.conf
--1380行--修改,开启AOF
appendonly yes
--1407行--指定AOF文件名称
appendfilename "appendonly.aof"
--1505行--是否忽略最后一条可能存在问题的指令
aof-load-truncated yes


systemctl restart redis-server.service

7. Execution process

Since each write command of Redis needs to be recorded, AOF does not need to be triggered. The following describes the execution process of AOF.

The execution process of AOF includes:
command append (append) : append the Redis write command to the buffer aof_buf;
file write (write) and file synchronization (sync) : synchronize the contents of aof_buf to Hard disk;
File rewrite (rewrite) : Periodically rewrite the AOF file to achieve the purpose of compression.

(1) Instruction addition (append)

Redis first appends the write command to the buffer instead of directly writing the file, mainly to avoid writing the command directly to the hard disk every time, causing the hard disk IO to become the bottleneck of Redis load.
The format of command append is the protocol format of Redis command request. It is a plain text format, which has the advantages of good compatibility, strong readability, easy processing, simple operation and avoiding secondary overhead. In the AOF file, except for the select command used to specify the database (such as select 0 to select database 0), which is added by Redis, all others are write commands sent by the client.

(2) File writing (write) and file synchronization (sync)

Redis provides a variety of synchronization file strategies for the AOF cache area. The strategy involves the write function and fsync function of the operating system.
In order to improve the efficiency of file writing, in modern operating systems, when the user calls the write function to write data to the file, the operation The system usually temporarily stores data in a memory buffer, and only writes the data in the buffer to the hard disk when the buffer is full or exceeds a specified time limit. Although this kind of operation improves efficiency, it also brings security problems: if the computer shuts down, the data in the memory buffer will be lost; therefore, the system also provides synchronization functions such as fsync and fdatasync, which can force the operating system to immediately update the data in the buffer. The data is written to the hard disk to ensure data security.

AOF缓存区的同步文件策略存在三种同步方式,它们分别是:
vim /usr/local/redis/conf/redis.conf
--1439--
●appendfsync always: 命令写入aof_buf后立即调用系统fsync操作同步到AOF文件,fsync完成后线程返回。这种情况下,每次有写命令都要同步到AOF文件,硬盘IO成为性能瓶颈,Redis只能支持大约几百TPS写入,严重降低了Redis的性能;即便是使用固态硬盘(SSD),每秒大约也只能处理几万个命令,而且会大大降低SSD的寿命。

●appendfsync no: 命令写入aof_buf后调用系统write操作,不对AOF文件做fsync同步;同步由操作系统负责,通常同步周期为30秒。这种情况下,文件同步的时间不可控,且缓冲区中堆积的数据会很多,数据安全性无法保证。

●appendfsync everysec: 命令写入aof_buf后调用系统write操作,write完成后线程返回;fsync同步文件操作由专门的线程每秒调用一次。everysec是前述两种策略的折中,是性能和数据安全性的平衡,因此是Redis的默认配置,也是我们推荐的配置。

(3) File rewriting (rewrite)

As time goes by, the Redis server executes more and more write commands, and the AOF file will become larger and larger; too large AOF files will not only affect the normal operation of the server, but also cause data recovery to take too long.

File rewriting refers to periodically rewriting the AOF file to reduce the size of the AOF file. It should be noted that AOF rewriting is to convert the data in the Redis process into write commands and synchronize to the new AOF file; it will not perform any read or write operations on the old AOF file!

Another point to note about file rewriting: For AOF persistence, file rewriting is strongly recommended, but not necessary; even without file rewriting, data can be persisted and started in Redis Time to import; therefore, in some realities, automatic file rewriting will be turned off, and then scheduled to be executed at a certain time every day through a scheduled task.

(3.1) The reason why file rewriting can compress AOF files

●Outdated data will no longer be written into the file
●Invalid commands will no longer be written into the file: for example, some data is repeatedly set (set mykey v1, set mykey v2), some data is deleted (set myset v1, del myset), etc. .
●Multiple commands can be combined into one: such as sadd myset v1, sadd myset v2, sadd myset v3 can be combined into sadd myset v1 v2 v3.

From the above content, it can be seen that since the commands executed by AOF are reduced after rewriting, file rewriting can not only reduce the space occupied by the file, but also speed up the recovery speed.

(3.2) The trigger of file rewriting is divided into manual trigger and automatic trigger

Manual trigger : call the bgrewriteaof command directly, the execution of this command is somewhat similar to bgsave: both fork sub-processes perform specific work, and both are blocked only when fork.
Automatic trigger : automatically execute BGREWRITEAOF by setting the auto-aof-rewrite-min-size option and auto-aof-rewrite-percentage option. Only when the two options of auto-aof-rewrite-min-size and auto-aof-rewrite-percentage are satisfied at the same time, AOF rewriting will be automatically triggered, that is, the bgrewriteaof operation.

vim /usr/local/redis/conf/redis.conf
--1480--
●auto-aof-rewrite-percentage 100	:当前AOF文件大小(即aof_current_size)是上次日志重写时AOF文件大小(aof_base_size)两倍时,发生BGREWRITEAOF操作
●auto-aof-rewrite-min-size 64mb :当前AOF文件执行BGREWRITEAOF命令的最小值,避免刚开始启动Reids时由于文件尺寸较小导致频繁的BGREWRITEAOF	


关于文件重写的流程,有两点需要特别注意:(1)重写由父进程fork子进程进行;(2)重写期间Redis执行的写命令,需要追加到新的AOF文件中,为此Redis引入了aof_rewrite_buf缓存。

(3.3) Process of file rewriting

(1) The Redis parent process first judges whether there is a child process currently executing bgsave/bgrewriteaof. If it exists, the bgrewriteaof command will return directly. If there is a bgsave command, wait for the bgsave execution to complete before executing it. 
(2) The parent process executes the fork operation to create a child process, and the parent process is blocked during this process.
(3.1) After the parent process forks, the bgrewriteaof command returns the message "Background append only file rewrite started" and no longer blocks the parent process, and can respond to other commands. All write commands of Redis are still written into the AOF buffer, and are synchronized to the hard disk according to the appendfsync strategy to ensure the correctness of the original AOF mechanism.
(3.2) Since the fork operation uses the copy-on-write technology, the child process can only share the memory data during the fork operation. Since the parent process is still responding to the command, Redis uses the AOF rewrite buffer (aof_rewrite_buf) to save this part of the data to prevent the loss of this part of the data during the generation of the new AOF file. That is to say, during the execution of bgrewriteaof, Redis write commands are simultaneously appended to two buffers, aof_buf and aof_rewrite_buf.
(4) According to the memory snapshot, the child process writes to the new AOF file according to the command merging rules.
(5.1) After the child process writes the new AOF file, it sends a signal to the parent process, and the parent process updates the statistical information, which can be viewed through info persistence.
(5.2) The parent process writes the data in the AOF rewrite buffer to the new AOF file, thus ensuring that the database state saved in the new AOF file is consistent with the current state of the server.
(5.3) Replace the old file with the new AOF file to complete AOF rewriting.

8. Load at startup

When AOF is turned on, Redis will first load the AOF file to restore data when it starts; only when AOF is turned off, it will load the RDB file to restore data.
When AOF is enabled but the AOF file does not exist, it will not be loaded even if the RDB file exists.
When Redis loads the AOF file, it will verify the AOF file. If the file is damaged, an error will be printed in the log, and Redis will fail to start. However, if the end of the AOF file is incomplete (sudden machine downtime, etc. may easily cause the end of the file to be incomplete), and the aof-load-truncated parameter is enabled, a warning will be output in the log, Redis ignores the end of the AOF file, and the startup is successful. The aof-load-truncated parameter is enabled by default.

9. Advantages and disadvantages of RDB and AOF

Advantages of RDB persistence : RDB files are compact, small in size, fast in network transmission, and suitable for full copy; recovery speed is much faster than AOF. Of course, one of the most important advantages of RDB compared to AOF is the relatively small impact on performance.

Disadvantages : The fatal disadvantage of RDB files is that the persistence method of data snapshots determines that real-time persistence cannot be achieved. Today, when data is becoming more and more important, a large amount of data loss is often unacceptable, so AOF persistence become mainstream. In addition, RDB files need to meet a specific format and have poor compatibility (for example, old versions of Redis are not compatible with new versions of RDB files).
For RDB persistence, on the one hand, the Redis main process will be blocked when bgsave performs the fork operation; on the other hand, writing data to the hard disk by the child process will also bring IO pressure.

AOF persistence
Corresponding to RDB persistence, the advantage of AOF is that it supports second-level persistence and good compatibility. The disadvantage is that the file is large, the recovery speed is slow, and it has a great impact on performance.
For AOF persistence, the frequency of writing data to the hard disk is greatly increased (second level under the everysec strategy), the IO pressure is greater, and it may even cause AOF additional blocking problems.
The rewriting of the AOF file is similar to RDB's bgsave, and there will be blocking during fork and IO pressure of the child process. Relatively speaking, since AOF writes data to the hard disk more frequently, it will have a greater impact on the performance of the Redis main process.
 

Three: Redis performance management

1. View Redis memory usage 

127.0.0.1:6379> info memory

 2. Memory fragmentation rate

mem_fragmentation_ratio : Memory fragmentation rate. mem_fragmentation_ratio = used_memory_rss / used_memory
used_memory_rss : It is the memory requested by Redis from the operating system.
used_memory : It is the memory occupied by the data in Redis.
used_memory_peak : The peak value of redis memory usage.
 

(1) How does memory fragmentation occur?

Redis has its own internal memory manager to manage the application and release of memory in order to improve the efficiency of memory usage.
When the value in Redis is deleted, the memory is not directly released and returned to the operating system, but to the internal memory manager of Redis.
When applying for memory in Redis, first check whether there is enough memory available in your own memory manager.
This mechanism of Redis improves the memory utilization rate, but it will cause some memory in Redis that is not used by itself but not released, resulting in memory fragmentation.

(2) Tracking the memory fragmentation rate is very important to understand the resource performance of the Redis instance

  • It is normal for the memory fragmentation rate to be between 1 and 1.5. This value indicates that the memory fragmentation rate is relatively low, and it also indicates that Redis does not exchange memory.
  • If the memory fragmentation rate exceeds 1.5, it means that Redis consumes 150% of the actual physical memory required, of which 50% is the memory fragmentation rate.
  • If the memory fragmentation rate is lower than 1, it means that the Redis memory allocation exceeds the physical memory, and the operating system is exchanging memory. Need to increase available physical memory or reduce Redis memory usage.

(3) Solve the problem of high fragmentation rate

If your Redis version is below 4.0, you need to enter the shutdown save command on the redis-cli tool to let the Redis database execute the save operation and close the Redis service, and then restart the server. After the Redis server restarts, Redis will return the useless memory to the operating system, and the fragmentation rate will drop.

Redis4.0版本开始,可以在不重启的情况下,线上整理内存碎片。
config set activedefrag yes     #自动碎片清理,内存就会自动清理了。
memory purge					#手动碎片清理

3. Memory usage

If the memory usage of the redis instance exceeds the maximum available memory, the operating system will start exchanging memory and swap space.

Ways to avoid memory swapping:
●Choose and install a Redis instance according to the size of the cached data
●Use the Hash data structure for storage as much as possible
●Set the expiration time of the key

4. Internal recovery key

The memory data elimination strategy ensures reasonable allocation of redis limited memory resources.

When the set maximum threshold is reached, a key recycling strategy needs to be selected. By default, the recycling strategy is to prohibit deletion.

vim /usr/local/redis/conf/redis.conf
--1149--
maxmemory-policy noenviction
●volatile-lru:使用LRU算法从已设置过期时间的数据集合中淘汰数据(移除最近最少使用的key,针对设置了TTL的key)
●volatile-ttl:从已设置过期时间的数据集合中挑选即将过期的数据淘汰(移除最近过期的key)
●volatile-random:从已设置过期时间的数据集合中随机挑选数据淘汰(在设置了TTL的key里随机移除)
●allkeys-lru:使用LRU算法从所有数据集合中淘汰数据(移除最少使用的key,针对所有的key)
●allkeys-random:从数据集合中任意选择数据淘汰(随机移除key)
●noenviction:禁止淘汰数据(不删除直到写满时报错)


 5. Other restrictions

maxclients
sets how many clients Redis can connect to at the same time.
10000 clients by default.
If this limit is reached, redis will reject new connection requests and send "max number of clients reached" to these connection requesters in response.

maxmemory
is recommended to be set, otherwise, the memory will be full and the server will be down.
Sets the amount of memory that redis can use. Once the upper limit of memory usage is reached, redis will try to remove internal data, and the removal rules can be specified by maxmemory-policy.
If redis cannot remove the data in the memory according to the removal rules, or if "removal is not allowed", then redis will return error messages for those instructions that need to apply for memory, such as SET, LPUSH, etc.
But for commands without memory application, it will still respond normally, such as GET and so on. If your redis is the main redis (indicating that the redis cluster has a master-slave), then when setting the upper limit of memory usage, you need to reserve some memory space in the system for the synchronization queue cache, only if you set the "do not remove" In this case, this factor need not be considered.

maxmemory-samples
sets the number of samples. Both the LRU algorithm and the minimum TTL algorithm are not exact algorithms, but estimated values, so you can set the sample size. By default, redis will check so many keys and select the LRU one.
Generally, a number from 3 to 7 is set. The smaller the value, the less accurate the sample, but the smaller the performance consumption.

Four: Redis master-slave replication

1. Introduction to Redis master-slave replication

Master-slave replication refers to copying the data of one Redis server to other Redis servers. The former is called the master node (Master), and the latter is called the slave node (Slave); data replication is one-way, only from the master node to the slave node.

By default, each Redis server is a master node; and a master node can have multiple slave nodes (or no slave nodes), but a slave node can only have one master node.

2. The role of master-slave replication

Data redundancy : Master-slave replication implements hot backup of data, which is a data redundancy method other than persistence.
Fault recovery : When the master node has a problem, the slave node can provide services to realize rapid fault recovery; it is actually a kind of service redundancy.
Load balancing : on the basis of master-slave replication, with read-write separation, the master node can provide write services, and the slave nodes can provide read services (that is, the application connects to the master node when writing Redis data, and the application connects to the slave node when reading Redis data ), to share the server load; especially in the scenario of writing less and reading more, sharing the read load through multiple slave nodes can greatly increase the concurrency of the Redis server.
The cornerstone of high availability : In addition to the above functions, master-slave replication is also the basis for the implementation of sentinels and clusters. Therefore, master-slave replication is the basis for high availability of Redis.

3. Master-slave replication process

(1) If a Slave machine process is started, it will send a "sync command" command to the Master machine to request a synchronous connection.
(2) Whether it is the first connection or reconnection, the Master machine will start a background process to save the data snapshot to the data file (execute rdb operation), and the Master will also record all the commands to modify the data and cache them in the data file middle.
(3) After the background process completes the cache operation, the Master machine will send the data file to the Slave machine, the Slave machine will save the data file to the hard disk, and then load it into the memory, and then the Master machine will modify all the data files The operation is sent to the Slave end machine together. If the Slave fails and causes a downtime, it will automatically reconnect after returning to normal.
(4) After the Master machine receives the connection from the Slave machine, it sends its complete data file to the Slave machine. If the Master receives synchronization requests from multiple Slaves at the same time, the Master will start a process in the background to save Data files, and then send it to all Slave-side machines to ensure that all Slave-side machines are normal.

4. Build Redis master-slave replication

(1) Install Redis

//环境准备
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/enforcing/disabled/' /etc/selinux/config

#修改内核参数
vim /etc/sysctl.conf
vm.overcommit_memory = 1
net.core.somaxconn = 2048

sysctl -p


//安装redis
yum install -y gcc gcc-c++ make

tar zxvf /opt/redis-7.0.9.tar.gz -C /opt/
cd /opt/redis-7.0.9
make
make PREFIX=/usr/local/redis install
#由于Redis源码包中直接提供了 Makefile 文件,所以在解压完软件包后,不用先执行 ./configure 进行配置,可直接执行 make 与 make install 命令进行安装。

#创建redis工作目录
mkdir /usr/local/redis/{conf,log,data}

cp /opt/redis-7.0.9/redis.conf /usr/local/redis/conf/

useradd -M -s /sbin/nologin redis
chown -R redis.redis /usr/local/redis/

#环境变量
vim /etc/profile 
PATH=$PATH:/usr/local/redis/bin		#增加一行

source /etc/profile


//定义systemd服务管理脚本
vim /usr/lib/systemd/system/redis-server.service
[Unit]
Description=Redis Server
After=network.target

[Service]
User=redis
Group=redis
Type=forking
TimeoutSec=0
PIDFile=/usr/local/redis/log/redis_6379.pid
ExecStart=/usr/local/redis/bin/redis-server /usr/local/redis/conf/redis.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target

(2) Modify the Redis configuration file (Master node operation)

vim /usr/local/redis/conf/redis.conf
bind 0.0.0.0									#87行,修改监听地址为0.0.0.0
protected-mode no								#111行,将本机访问保护模式设置no
port 6379										#138行,Redis默认的监听6379端口
daemonize yes									#309行,设置为守护进程,后台启动
pidfile /usr/local/redis/log/redis_6379.pid		#341行,指定 PID 文件
logfile "/usr/local/redis/log/redis_6379.log"	#354行,指定日志文件
dir /usr/local/redis/data						#504行,指定持久化文件所在目录
#requirepass abc123								#1037行,可选,设置redis密码
appendonly yes									#1380行,开启AOF


systemctl restart redis-server.service

 

 (3) Modify the Redis configuration file (Slave node operation)

vim /usr/local/redis/conf/redis.conf
bind 0.0.0.0									#87行,修改监听地址为0.0.0.0
protected-mode no								#111行,将本机访问保护模式设置no
port 6379										#138行,Redis默认的监听6379端口
daemonize yes									#309行,设置为守护进程,后台启动
pidfile /usr/local/redis/log/redis_6379.pid		#341行,指定 PID 文件
logfile "/usr/local/redis/log/redis_6379.log"	#354行,指定日志文件
dir /usr/local/redis/data						#504行,指定持久化文件所在目录
#requirepass abc123								#1037行,可选,设置redis密码
appendonly yes									#1380行,开启AOF
replicaof 192.168.80.10 6379					#528行,指定要同步的Master节点IP和端口
#masterauth abc123								#535行,可选,指定Master节点的密码,仅在Master节点设置了requirepass


systemctl restart redis-server.service

 (4) Verify the master-slave effect

View logs on the Master node

tail -f /usr/local/redis/log/redis_6379.log 
Replica 192.168.80.11:6379 asks for synchronization
Replica 192.168.80.12:6379 asks for synchronization
Synchronization with replica 192.168.80.11:6379 succeeded
Synchronization with replica 192.168.80.12:6379 succeeded

Verify the slave nodes on the master node

redis-cli info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.80.11,port=6379,state=online,offset=1246,lag=0
slave1:ip=192.168.80.12,port=6379,state=online,offset=1246,lag=1

 Five: Redis sentinel mode

1. The method of master-slave switching technology

The method of master-slave switching technology is: when the server is down, it is necessary to manually switch a slave machine to the master machine, which requires manual intervention, which is not only time-consuming and laborious, but also causes the service to be unavailable for a period of time. In order to solve the shortcomings of master-slave replication, there is a sentinel mechanism.

Sentinel's core function: Based on master-slave replication, Sentinel introduces automatic failover of the master node.

2. The principle of sentinel mode

Sentinel:  It is 分布式系统used to select a new master through a voting mechanism主从结构中的每台服务器进行监控 and connect all slaves to the new master when a failure occurs . So the number of the entire running cluster . 哨兵不得少于三个节点哨兵必须是奇数

 

3. The role of sentinel mode

  • Monitoring : Sentinel will constantly check whether the master node and slave nodes are running normally.
  • Automatic failover : When the master node fails to work normally, Sentinel will start an automatic failover operation. She will upgrade one of the slave nodes of the failed master node to the new master node, and change the other slave nodes to the new master node.
  • Notifications (reminders) : Sentinels can send failover results to clients.

The Sentinel structure consists of two parts: Sentinel nodes and data nodes

  • Sentinel node : The sentinel system consists of one or more sentinel nodes, which are special redis nodes that do not store data.
  • Data nodes : both master and slave nodes are data nodes

 4. Failover mechanism

1. The sentinel node regularly monitors to find out whether the master node is faulty.
Each sentinel node will send a ping command to the master node, slave node and its sentinel node every 1 second for a heartbeat detection.

  • If the master node does not reply within a certain time frame or replies with an error message, then the sentinel will consider the master node to be offline subjectively (unilaterally).
  • When more than half of the sentinel nodes think that the master node is offline subjectively, it is objectively offline.

2. When the master node fails , the sentinel node will implement the election mechanism through the raft algorithm (election algorithm) to jointly elect a sentinel node as the leader to be responsible for handling the failover and notification of the master node, so the entire cluster running the sentinel The number must not be less than 3 nodes.

3. The failover is performed by the leader sentinel node, the process is as follows :

  • Upgrade a slave node to a new master node, and let other slave nodes point to the new master node;
  • If the original master node recovers, it becomes a slave node and points to the new master node
  • Notify the client that the primary node has been replaced.

It is important to note that objective offline is a concept unique to the master node; if the slave node and sentinel node fail, after being offline subjectively by the sentinel, there will be no subsequent objective offline and failover operations.

Failover : It is equivalent to a gangster society. There will be several branch bosses in it, and then there will be a biggest boss who regularly goes to the biggest boss for meetings. If the biggest boss makes a mistake and is dismissed, the rest of the branch bosses will replace them. Then even if the original biggest boss wants to restore his original identity later, he cannot be restored directly, and elections are still required.

The election of the master node:

1. Filter out unhealthy (offline) slave nodes that have not responded to sentinel ping responses.
2. Select the slave node with the highest priority configuration in the configuration file. (replica-priority, the default value is 100)
3. Select the slave node with the largest replication offset, that is, the most perfect replication.

The start of sentinel depends on the master-slave mode, so the master-slave mode must be installed before doing the sentinel mode.
 

5. Build Redis sentinel mode 

(1) Environment construction

Based on master-slave replication has been built

host operating system IP address Software/Installation Package/Tools
Master CentOS7 192.168.10.27 redis-7.0.9.tar.gz
Slave1 CentOS7 192.168.10.28 redis-7.0.9.tar.gz
Slave2 CentOS7 192.168.10.29 redis-7.0.9.tar.gz

(2) Modify the configuration file of Redis sentinel mode (all node operations)

systemctl stop firewalld
setenforce 0


cp /opt/redis-7.0.9/sentinel.conf /usr/local/redis/conf/
chown redis.redis /usr/local/redis/conf/sentinel.conf

vim /usr/local/redis/conf/sentinel.conf
protected-mode no									#6行,关闭保护模式
port 26379											#10行,Redis哨兵默认的监听端口
daemonize yes										#15行,指定sentinel为后台启动
pidfile /usr/local/redis/log/redis-sentinel.pid		#20行,指定 PID 文件
logfile "/usr/local/redis/log/sentinel.log"			#25行,指定日志存放路径
dir /usr/local/redis/data							#54行,指定数据库存放路径
sentinel monitor mymaster 192.168.80.10 6379 2		#73行,修改 指定该哨兵节点监控192.168.80.10:6379这个主节点,该主节点的名称是mymaster,最后的2的含义与主节点的故障判定有关:至少需要2个哨兵节点同意,才能判定主节点故障并进行故障转移
#sentinel auth-pass mymaster abc123					#76行,可选,指定Master节点的密码,仅在Master节点设置了requirepass
sentinel down-after-milliseconds mymaster 3000		#114行,判定服务器down掉的时间周期,默认30000毫秒(30秒)
sentinel failover-timeout mymaster 180000			#214行,同一个sentinel对同一个master两次failover之间的间隔时间(180秒)

 (3) Start sentry mode

所有服务器
先启master,再启slave
cd /usr/local/redis/conf/
redis-sentinel sentinel.conf &

(4) View sentinel information

redis-cli -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.80.10:6379,slaves=2,sentinels=3

(5) Fault simulation

[root@master ~]# ps aux | grep redis
avahi      6123  0.0  0.1  62276  2076 ?        Ss   15:07   0:00 avahi-daemon: running [redis.local]
root       9917  0.1  0.4 156452  7860 ?        Ssl  18:46   0:06 /usr/local/redis/bin/redis-server 0.0.0.0:6379
root      10707  0.2  0.4 153892  7828 ?        Ssl  19:58   0:01 redis-sentinel *:26379 [sentinel]
root      10934  0.0  0.0 112728   988 pts/0    S+   20:08   0:00 grep --color=auto redis

[1]+  完成                  redis-sentinel sentinel.conf

Kill the process number of redis-server on the Master node

kill -9 4547 #The process number of redis-server on the Master node

 (6) Verification result

tail -f /usr/local/redis/log/sentinel.log

redis-cli -p 26379 INFO Sentinel

 Six: Redis cluster mode

1. Redis cluster concept

Cluster, namely Redis Cluster, is a distributed storage solution introduced by Redis 3.0.

The cluster consists of multiple groups of nodes (Nodes), and Redis data is distributed among these nodes. The nodes in the cluster are divided into master nodes and slave nodes: only the master node is responsible for the maintenance of read and write requests and cluster information; the slave nodes only replicate the data and status information of the master node .

2. The role of the cluster

(1) Data partition: Data partition (or data fragmentation) is the core function of the cluster.

The cluster distributes data to multiple nodes. On the one hand, it breaks through the limit of Redis single-machine memory size, and the storage capacity is greatly increased; on the other hand, each master node can provide external read and write services, which greatly improves the responsiveness of the cluster.
Redis stand-alone memory size is limited, which was mentioned in the introduction of persistence and master-slave replication; As a result, the slave node cannot provide services for a long time, and the replication buffer of the master node may overflow during the full replication phase.

(2) High availability: the cluster supports master-slave replication and automatic failover of the master node (similar to Sentinel)

When any node fails, the cluster can still provide external services.

3. Data fragmentation of Redis cluster

  • Redis cluster introduces the concept of hash slots
  • Redis cluster has 16384 hash slots (numbered 0-16383)
  • Each set of nodes in the cluster is responsible for a portion of the hash slots
  • After each Key passes the CRC16 check, take the remainder of 16384 to determine which hash slot to place. Through this value, find the node corresponding to the corresponding slot, and then directly and automatically jump to the corresponding node for access operations
#以3个节点组成的集群为例:
节点A包含0到5460号哈希槽
节点B包含5461到10922号哈希槽
节点C包含10923到16383号哈希槽
#Redis集群的主从复制模型
集群中具有A、B、C三个节点,如果节点B失败了,整个集群就会因缺少5461-10922这个范围的槽而不可以用。
为每个节点添加一个从节点A1、B1、C1整个集群便有三个Master节点和三个slave节点组成,在节点B失败后,集群选举B1位为的主节点继续服务。当B和B1都失败后,集群将不可用

4. Build Redis cluster mode

A redis cluster generally requires 6 nodes, 3 masters and 3 slaves. For convenience, here all nodes are simulated on the same server:
distinguished by port numbers: 3 master node port numbers: 6001/6002/6003, corresponding slave node port numbers: 6004/6005/6006.

(1) All six servers need to install the redis database

cd /usr/local/redis/
mkdir -p redis-cluster/redis600{1..6}

for i in {1..6}
do
cp /opt/redis-7.0.9/redis.conf /usr/local/redis/redis-cluster/redis600$i
cp /opt/redis-7.0.9/src/redis-cli /opt/redis-7.0.9/src/redis-server /usr/local/redis/redis-cluster/redis600$i
done

(2) Turn on the cluster function

#其他5个文件夹的配置文件以此类推修改,注意6个端口都要不一样。
cd /usr/local/redis/redis-cluster/redis6001
vim redis.conf
#bind 127.0.0.1									#87行,注释掉bind项,默认监听所有网卡
protected-mode no								#111行,关闭保护模式
port 6001										#138行,修改redis监听端口
daemonize yes									#309行,设置为守护进程,后台启动
pidfile /usr/local/redis/log/redis_6001.pid		#341行,指定 PID 文件
logfile "/usr/local/redis/log/redis_6001.log"	#354行,指定日志文件
dir ./											#504行,指定持久化文件所在目录
appendonly yes									#1379行,开启AOF
cluster-enabled yes								#1576行,取消注释,开启群集功能
cluster-config-file nodes-6001.conf				#1584行,取消注释,群集名称文件设置
cluster-node-timeout 15000						#1590行,取消注释群集超时时间设置

(3) Start the redis node

#启动redis节点
分别进入那六个文件夹,执行命令:redis-server redis.conf ,来启动redis节点
cd /usr/local/redis/redis-cluster/redis6001
redis-server redis.conf

for d in {1..6}
do
cd /usr/local/redis/redis-cluster/redis600$d
./redis-server redis.conf
done

ps -ef | grep redis

#启动集群
redis-cli --cluster create 127.0.0.1:6001 127.0.0.1:6002 127.0.0.1:6003 127.0.0.1:6004 127.0.0.1:6005 127.0.0.1:6006 --cluster-replicas 1

#六个实例分为三组,每组一主一从,前面的做主节点,后面的做从节点。下面交互的时候 需要输入 yes 才可以创建。
--replicas 1 表示每个主节点有1个从节点。

 

 

 (4) Test cluster

#测试群集
redis-cli -p 6001 -c					#加-c参数,节点之间就可以互相跳转
127.0.0.1:6001> cluster slots			#查看节点的哈希槽编号范围
1) 1) (integer) 5461
   2) (integer) 10922									#哈希槽编号范围
   3) 1) "127.0.0.1"
      2) (integer) 6003									#主节点IP和端口号
      3) "fdca661922216dd69a63a7c9d3c4540cd6baef44"
   4) 1) "127.0.0.1"
      2) (integer) 6004									#从节点IP和端口号
      3) "a2c0c32aff0f38980accd2b63d6d952812e44740"
2) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 6001
      3) "0e5873747a2e26bdc935bc76c2bafb19d0a54b11"
   4) 1) "127.0.0.1"
      2) (integer) 6006
      3) "8842ef5584a85005e135fd0ee59e5a0d67b0cf8e"
3) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6002
      3) "816ddaa3d1469540b2ffbcaaf9aa867646846b30"
   4) 1) "127.0.0.1"
      2) (integer) 6005
      3) "f847077bfe6722466e96178ae8cbb09dc8b4d5eb"

127.0.0.1:6001> set name zhangsan
-> Redirected to slot [5798] located at 127.0.0.1:6003
OK

127.0.0.1:6001> cluster keyslot name					#查看name键的槽编号

redis-cli -p 6004 -c
127.0.0.1:6004> keys *							#对应的slave节点也有这条数据,但是别的节点没有
1) "name"


redis-cli -p 6001 -c cluster nodes

 

 5. Add nodes to the Cluster cluster and dynamically expand capacity

1. Create a new master node

redis 5的集群支持在有负载的情况下增加节点动态扩容。

已有集群为6个节点127.0.0.1:6001 - 127.0.0.1:6006,3组主从节点。现要增加第4组主从节点127.0.0.1:6007,127.0.0.1:6008
创建一个新的主节点127.0.0.1:6007。命令里需要指定一个已有节点以便于获取集群信息,本例是指定的127.0.0.1:6001

redis-cli -p 6001 --cluster add-node 127.0.0.1:6007 127.0.0.1:6001
或
redis-cli -p 6001
cluster meet 127.0.0.1 6007
cluster meet 127.0.0.1 6008

2. Set the master-slave node

将127.0.0.1:6008创建为127.0.0.1:6007的从节点。命令里需要指定一个已有节点以便于获取集群信息和主节点的node ID
redis-cli -p 6001 --cluster add-node 127.0.0.1:6008 127.0.0.1:6001 --cluster-slave --cluster-master-id e44678abed249e22482559136bf45280fd3ac281
或
redis-cli -p 6008
cluster replicate e44678abed249e22482559136bf45280fd3ac281

3. Allocate slots to new nodes

新加入的主节点是没有槽数的,只有初始化集群的时候,才会根据主的数据分配好,如新增的主节点,需要手动分配
redis-cli -p 6007 --cluster reshard 127.0.0.1:6001 --cluster-from e1a033e07f0064e6400825b4ddbcd6680c032d10 --cluster-to e44678abed249e22482559136bf45280fd3ac281 --cluster-slots 1000 --cluster-yes
或
redis-cli -p 6007 --cluster reshard 127.0.0.1:6001
How many slots do you want to move (from 1 to 16384)? 1000                    #指定转移槽的数量
What is the receiving node ID? e44678abed249e22482559136bf45280fd3ac281       #指定接收槽数量的主节点node ID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1: e1a033e07f0064e6400825b4ddbcd6680c032d10           #指定分配的主节点node ID
Source node #2: done                                               #输入完毕,开始转移

4. View the cluster status

redis-cli -p 6001 cluster nodes

Summarize 

 1. Redis optimization
Turn on AOF persistence
Set config set activedefrag yes to turn on automatic memory fragment cleaning, or perform memory purge regularly to clean memory fragments Set memory
data elimination strategy maxmemory-policy to ensure that memory usage does not exceed the system’s maximum memory
maxmemory Set redis to occupy the largest Memory value, maxmemory-samples sets the number of samples for the elimination strategy algorithm
. Use the Hash data type to store data as much as possible. If the Hash contains few fields, then this type of data will only take up very little space.
Set the expiration time of the key. Simplify the key name and key value, and control the size of the key value.
Set config set requirepass to enable password verification.
Reasonably set the maxclient maximum connection number parameter (10000), tcp-backlog connection queue number (1024), and timeout connection timeout time (30000) to  
deploy master-slave replication , back up data, and use sentinel or cluster solutions to achieve high availability


2. Double-write consistency between cache and database
Update the database first, then delete the cache + cache expiration time, after the data expires, there is a read request to directly update the cache from the database


3. Cache avalanche
A large area of ​​the cache expires at the same time, so subsequent requests will fall on the database, causing the database to collapse due to a large number of requests in a short period of time.

Solution:
The expiration time of the cached data is set randomly to prevent a large amount of data from being expired at the same time.
Generally, when the amount of concurrency is not particularly large, the most used solution is to lock and queue.
Add a corresponding cache tag to each cached data, record whether the cache is invalid, and update the data cache if the cache tag is invalid.


4. Cache breakdown
There is no data in the cache but there is data in the database (usually the cache time expires). At this time, due to the large number of concurrent users, the read cache does not read the data at the same time, and at the same time go to the database to fetch the data, causing pressure on the database Instantaneous increase, causing excessive pressure.
Different from cache avalanche, cache breakdown refers to checking the same piece of data concurrently. Cache avalanche means that different data have expired, and many data cannot be found, so the database is searched.

Solution
Set hotspot data to never expire.
Add mutex, mutex


5. Cache penetration
Data that is neither in the cache nor in the database causes all requests to fall on the database, causing the database to collapse due to a large number of requests in a short period of time.

Solution:
Add verification at the interface layer, such as user authentication verification, id for basic verification, and direct interception of id<=0;
data that cannot be retrieved from the cache is not retrieved from the database, and it can also be used at this time Write the key-value pair as key-null, and the cache validity time can be set to a short point, such as 30 seconds (setting too long will cause it to be unusable under normal conditions). This can prevent the attacking user from repeatedly using the same id to brute force
the Bloom filter to hash all possible data into a large enough bitmap, and a data that must not exist will be intercepted by this bitmap, thereby avoiding Query pressure on the underlying storage system

Guess you like

Origin blog.csdn.net/A1100886/article/details/131475062