Redis (9) ------ Sentinel mode, cache penetration and avalanche

Redis (9) ------ Sentinel mode, cache penetration and avalanche

15. Sentry mode

15.1 Overview

  • Redis is officially available starting from 2.8Sentinel(哨兵模式)
  • The method of master-slave switching technology is: when the host server goes down, one server needs to be manually switched to the master server, which requires human intervention, is time-consuming and labor-intensive, and will also cause the service to be unavailable for a period of time. This method is not recommended, so consider using it most of the time哨兵模式
  • Sentinel mode is a special mode. First of all, Redis provides sentinel commands. Sentinel is an independent process. As a process, it will run independently.
  • Principle: Sentinel monitors multiple running Redis instances by sending commands and waiting for a response from the Redis server.

Insert image description here

  • Sentinels serve two purposes:
    • By sending commands, let the Redis server return to monitor its running status, including the master server and slave server.
    • When Sentinel detects that the master is down, it will automatically switch the slave to the master, and then 发布订阅模式notify other slave servers and modify the configuration file to allow them to switch hosts.
  • However, problems may arise when a sentinel process monitors the Redis server. For this reason, multiple sentinels can be used for monitoring.
  • Each sentinel will also be monitored, thus forming a multi-sentry mode.

Insert image description here

  • Assume that the main server is down and Sentinel 1 detects this result first. The system will not proceed immediately failover(故障转移). It is just that Sentinel 1 subjectively believes that the main server is unavailable. This phenomenon is called 主观下线. When the subsequent sentinels also detect that the main server is unavailable, and the number reaches a certain value, a vote will be held between the sentinels, and the voting results will be initiated by one sentinel for operation failover(故障转移). After the switch is successful, each sentry will use the publish-subscribe mode to switch the slave server it monitors to the host. This process is called客观下线

15.2 Testing

  • The current status of the server is one master and two slaves
  • Configure the sentinel configuration file:sentinel.conf
# sentinel monitor 被监控的服务器 host(主服务器IP) port(主服务器端口号) 1
# 数字1代表,主机宕机,slave投票看让谁接替成为主机,票数最多的,会成为主机
sentinel monitor myredis 127.0.0.1 6379 1
  • Start sentry mode:redis-sentinel xxx(文件夹名)/sentinel.conf
[root@iz2ze0l46im3eg03queta2z bin]# redis-sentinel myredisconfig/sentinel.conf 
26866:X 04 Nov 2020 16:47:08.573 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
26866:X 04 Nov 2020 16:47:08.573 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=26866, just started
26866:X 04 Nov 2020 16:47:08.573 # Configuration loaded
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.0.9 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 26866
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

26866:X 04 Nov 2020 16:47:08.574 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
26866:X 04 Nov 2020 16:47:08.576 # Sentinel ID is 276060e845a7365dbef0fe397a9c4907dab24e90
26866:X 04 Nov 2020 16:47:08.576 # +monitor master myredis 127.0.0.1 6379 quorum 1
26866:X 04 Nov 2020 16:47:08.577 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ myredis 127.0.0.1 6379
26866:X 04 Nov 2020 16:47:08.579 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ myredis 127.0.0.1 6379
  • At this time, Sentinel successfully monitors host 6379. When the host is disconnected, a new host will be selected using the voting algorithm in the slave.

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-r6VSMt8p-1648209734594) (C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\ image-20220325143606754.png)]

15.3 Advantages and Disadvantages of Sentry Mode

  • advantage:
    • Sentinel cluster, based on the master-slave replication model, has all the advantages of master-slave replication.
    • Master and slave can be switched, faults can be transferred, and the system has better availability.
    • Sentry mode is an upgrade of the master-slave mode. From manual to automatic, the system is more robust.
  • shortcoming:
    • Redis is not easy to expand online. Once the number of clusters reaches the upper limit, online expansion will be very troublesome.
    • Implementing the configuration of sentinel mode is actually very troublesome. There are many configuration items in it.

15.4 Sentry mode full configuration

  • Sentry mode configuration file:sentinel.conf
# Example sentinel.conf
 
# 哨兵sentinel实例运行的端口 默认26379
port 26379
 
# 哨兵sentinel的工作目录
dir /tmp
 
# 哨兵sentinel监控的redis主节点的 ip port 
# master-name  可以自己命名的主节点名字 只能由字母A-z、数字0-9 、这三个字符".-_"组成。
# quorum 当这些quorum个数sentinel哨兵认为master主节点失联 那么这时 客观上认为主节点失联了
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel monitor mymaster 127.0.0.1 6379 1
 
# 当在Redis实例中开启了requirepass foobared 授权密码 这样所有连接Redis实例的客户端都要提供密码
# 设置哨兵sentinel 连接主从的密码 注意必须为主从设置一样的验证密码
# sentinel auth-pass <master-name> <password>
sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
 
 
# 指定多少毫秒之后 主节点没有应答哨兵sentinel 此时 哨兵主观上认为主节点下线 默认30秒
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000
 
# 这个配置项指定了在发生failover主备切换时最多可以有多少个slave同时对新的master进行 同步,
# 这个数字越小,完成failover所需的时间就越长,
# 但是如果这个数字越大,就意味着越 多的slave因为replication而不可用。
# 可以通过将这个值设为 1 来保证每次只有一个slave 处于不能处理命令请求的状态。
# sentinel parallel-syncs <master-name> <numslaves>
sentinel parallel-syncs mymaster 1
 
 
 
# 故障转移的超时时间 failover-timeout 可以用在以下这些方面: 
# 1. 同一个sentinel对同一个master两次failover之间的间隔时间。
# 2. 当一个slave从一个错误的master那里同步数据开始计算时间。直到slave被纠正为向正确的master那里同步数据时。
# 3.当想要取消一个正在进行的failover所需要的时间。  
# 4.当进行failover时,配置所有slaves指向新的master所需的最大时间。不过,即使过了这个超时,slaves依然会被正确配置为指向master,但是就不按parallel-syncs所配置的规则来了
# 默认三分钟
# sentinel failover-timeout <master-name> <milliseconds>
sentinel failover-timeout mymaster 180000
 
# SCRIPTS EXECUTION
 
# 配置当某一事件发生时所需要执行的脚本,可以通过脚本来通知管理员,例如当系统运行不正常时发邮件通知相关人员。
# 对于脚本的运行结果有以下规则:
# 若脚本执行后返回1,那么该脚本稍后将会被再次执行,重复次数目前默认为10
# 若脚本执行后返回2,或者比2更高的一个返回值,脚本将不会重复执行。
# 如果脚本在执行过程中由于收到系统中断信号被终止了,则同返回值为1时的行为相同。
# 一个脚本的最大执行时间为60s,如果超过这个时间,脚本将会被一个SIGKILL信号终止,之后重新执行。
 
# 通知型脚本:当sentinel有任何警告级别的事件发生时(比如说redis实例的主观失效和客观失效等等),将会去调用这个脚本,
# 这时这个脚本应该通过邮件,SMS等方式去通知系统管理员关于系统不正常运行的信息。调用该脚本时,将传给脚本两个参数,
# 一个是事件的类型,
# 一个是事件的描述。
# 如果sentinel.conf配置文件中配置了这个脚本路径,那么必须保证这个脚本存在于这个路径,并且是可执行的,否则sentinel无法正常启动成功。
# 通知脚本
# sentinel notification-script <master-name> <script-path>
  sentinel notification-script mymaster /var/redis/notify.sh
 
# 客户端重新配置主节点参数脚本
# 当一个master由于failover而发生改变时,这个脚本将会被调用,通知相关的客户端关于master地址已经发生改变的信息。
# 以下参数将会在调用脚本时传给脚本:
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
# 目前<state>总是“failover”,
# <role>是“leader”或者“observer”中的一个。 
# 参数 from-ip, from-port, to-ip, to-port是用来和旧的master和新的master(即旧的slave)通信的
# 这个脚本应该是通用的,能被多次调用,不是针对性的。
# sentinel client-reconfig-script <master-name> <script-path>
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh

16. Cache penetration and avalanche

16.1 High availability issues of services

  • The use of Redis cache greatly improves the performance and efficiency of applications, especially in data query. At the same time, he also brought some problems. Among them, the most critical problem is 数据的一致性the problem (the transaction cannot guarantee atomicity during runtime). Strictly speaking, this problem has no solution.
  • If the consistency requirements of the data are very high, caching cannot be used
  • Other typical problems: cache penetration, cache avalanche, and cache breakdown. Currently, there are some popular solutions

Insert image description here

16.2 Cache penetration

  • Caused by inability to find data
16.2.1 Overview
  • The concept of cache penetration is very simple. The user wants to query a piece of data and finds that the redis memory database does not have it. That is, the cache does not hit, so it queries the persistence layer database. No found, this query failed.
  • When there are many users, the cache does not hit (instant kill), so they all request the persistence layer database, which will put a lot of pressure on the persistence layer database. At this time, it is equivalent to cache penetration. If the database cannot be found, there will be no cache, and database access will continue.
16.2.2 Solution
bloom filter
  • Bloom filter is a data structure that stores all possible query parameters in the form of Hash to quickly determine whether this value exists. The control layer first performs interception and verification. If the verification fails, it will be returned directly, which reduces the storage cost. system pressure

Insert image description here

Cache empty objects
  • When the storage layer misses, the empty object returned in time will also be cached, and an expiration time will be set. Subsequent accesses to the data will be obtained from the cache, protecting the back-end data source. That is, if a request is not found in the cache or database, an empty object will be placed in the cache to handle subsequent requests.

Insert image description here

  • There are two problems with this approach:
    • If vacancies can be cached, it means that the cache needs more space to store more keys, because there may be many keys with null values.
    • Even if an expiration time is set for null values, there will still be inconsistencies between the data in the cache layer and the storage layer for a period of time, which will have an impact on businesses that need to preserve consistency.

16.3 Cache breakdown (too much volume, cache expires)

  • The query volume is too large and the cache expires.
16.3.1 Overview
  • Compared with cache penetration, cache penetration is more purposeful. For an existing key, when the cache expires, there will be a large number of requests at the same time. These requests will penetrate to the DB, resulting in a large amount of instantaneous DB requests and sudden pressure. Increase will break the cache. This is cache breakdown, which is caused by the unavailability of the cache for one of the keys, but other keys can still use cached responses.
16.3.2 Solution
  • 设置热点数据永不过期: In this way, hotspot data will not expire, but when the Redis memory space is full, part of the data will be cleared, and this solution will take up space. Once there is more hotspot data, it will take up part of the space.
  • 加互斥锁(分布式锁): Before accessing the key, use SETNX (set if ont exist) to set another short-term key to lock access to the current key, and then delete the short-term key after the access. Ensure that only one thread accesses it at the same time. This puts very high requirements on locks

16.4 Cache Avalanche

16.4.1 Overview
  • Cache avalanche means that there is an error in the cache layer and it cannot work properly. Therefore, all requests will reach the storage layer, and the number of calls to the storage layer will increase dramatically, causing the storage layer to hang up. Such a large number of keys are set to the same expiration time, but all caches will expire at the same time, causing an instant DB Heavy requests and sudden increase in pressure caused an avalanche

Insert image description here

16.4.2 Solution
  • redis高可用: Since redis may go down, add a few more redis, so that after one goes down, the others can continue to work. In fact, it is a cluster.
  • 限流降级: After the cache expires, control the number of threads reading the database write cache through locking or queuing. For example: only one thread is allowed to query data and write cache for a certain key, while other threads wait
  • 数据预热: The meaning of data preheating is to pre-access possible data before formal deployment, so that some data that may be accessed in large quantities will be loaded into the cache. Before a large concurrent access is about to occur, manually trigger the loading of different keys in the cache and set different expiration times to make the cache invalidation time points as even as possible.

Guess you like

Origin blog.csdn.net/weixin_44176393/article/details/123744195
Recommended