1. Performance management of redis
Redis data is cached in memory
Check system memory status
info memory
used_memory:853688 The memory occupied by data in redis
used_memory_rss:10522624 Memory requested by redis from the operating system
used_memory_peak:853688 The peak value of memory used by redis
System inspection: hardware inspection, database nginx redis docker k8s and other software inspections
Memory fragmentation rate: used_memory_rss/used_memory=memory fragmentation rate
The system has allocated memory to redis, but redis cannot effectively utilize the memory.
Query ratio:
redis-cli info memory | grep ratio
allocator_frag_ratio:1.26
The proportion of allocator fragments, the memory generated when the main process of redis is scheduled. The smaller the ratio, the better. The higher the value, the more memory is wasted.
allocator_rss_ratio:4.54
The proportion of physical memory occupied by the allocator tells you how much physical memory is occupied by the main process when scheduling execution.
rss_overhead_ratio:1.36
RSS is the memory space requested from the system. It is the additional overhead proportion of the physical space occupied by redis. The lower the ratio, the better. It means that the closer the actual memory occupied by redis is to the memory requested from the system, the lower the additional overhead will be.
mem_fragmentation_ratio:15.16
The proportion of memory fragmentation, the lower the better, the higher the memory usage.
Clean up debris:
Automatically clean up debris:
because /etc/redis/6379.conf
Insert the last line:
activedefrag yes
Turn on automatic cleaning
To set the maximum memory threshold of redis
Once the threshold is reached, the fragments will be automatically cleaned up and the key recycling mechanism will be enabled.
Line 567
Set the maximum memory occupied
maxmemory 1gb
Be sure to set the threshold for memory usage
Line 598
Set up the key recycling mechanism:
Key recycling strategy:
maxmemory-policy volatile-lru
Use redis's built-in LRU algorithm to eliminate data from key-value pairs that have set expiration times, and remove the least recently used key-value pairs (only for key-value pairs with expiration times set).
maxmemory-policy volatile-ttl
Select a key-value pair that is about to expire (for key-value pairs with an expiration time set).
maxmemory-policy volatile-random
From the expired key-value pairs that have been set, select data to randomly eliminate key-value pairs (key-value pairs with expiration time set will be removed at will)
allkeys-lru
According to the LRU algorithm, all key-value pairs are eliminated and the least used key-value pairs are removed (for all key-value pairs)
allkeys-random
Eliminate any option data from all key-value pairs
maxmemory-policy noeviction
Recycle key-value pairs (do not delete any key-value pairs until redis fills up the memory and cannot write anymore, until an error is reported)
At work, be sure to set a threshold for the memory occupied by redis.
Manual configuration:
redis-cli memory purge
How to deal with the problem of memory efficiency occupied by reids at work:
1. During daily inspections, monitor the occupancy of redis
2. Set the threshold for redis to occupy system memory to avoid occupying all system memory.
3. Memory fragmentation cleaning, manual and automatic cleaning
4. Configure a suitable key recycling mechanism
redis avalanche:
It is also a cache avalanche: a large number of application requests cannot be processed in the redis cache, and all requests will be sent to the background database.
The database itself has poor concurrency capabilities. Once concurrency is high, the database will collapse quickly.
Causes of avalanches:
Large area failure of redis cluster
In the redi cache, a large amount of data expires at the same time, and a large number of requests cannot be processed.
The redis instance is down.
Solution:
Beforehand: Highly available architecture that protects against entire cache failures. Only from copy and sentry modes. redis cluster
In progress: The three methods commonly used in China: HySTRIX, fuse, downgrade, and current limiting are used to reduce losses after an avalanche.
Afterwards: redis backup. Fast cache warm-up
Redis cache breakdown:
Cause:
The main reason is that the hotspot data cache has expired or been deleted. Multiple requests access the hotspot data concurrently, and the requests are also forwarded to the database, resulting in a rapid decline in database performance.
Cache data that is frequently requested is best set to never expire.
redis cache penetration:
There is no data in the cache, and the database does not correspond to the data, but some users have been making requests that do not exist, and the requested data format is very large. Usually hackers exploit vulnerabilities to overwhelm application databases.
The key-value pair is still there, but the value has been replaced. After the original request cannot be found, the backend database will also be requested, which is also a type of breakdown.
redis cluster
High availability solution:
- Endurance
- High availability master-slave replication sentinel mode cluster
Master-slave replication is the basis for redis to achieve high availability. Sentinel mode and cluster are both based on master-slave replication to achieve high availability.
Master-slave replication realizes multi-level backup of data and separation of reading and writing (the master is responsible for writing and the slave is responsible for reading)
Disadvantages: Failure cannot be automatically recovered. Manual intervention is required and write operations cannot achieve load balancing.
How master-slave replication works:
- The master node is composed of master and slave nodes. Data replication is one-way and can only be from the master node to the slave node.
Architecture configuration:
Master-slave replication:
main 20.0.0.41
from 1 20.0.0.42
from 2 20.0.0.43
Turn off firewall and security mechanisms
Modify the configuration file of the master node
because /etc/redis/6379.conf
bind 0.0.0.0 #70 line, modify the listening address to 0.0.0.0 (in a production environment, especially with multiple network cards, it is best to fill in the IP of the physical network card)
daemonize yes #137 line, start the daemon process and start it in the background
logfile /var/log/redis_6379.log #172 line, specify the log file storage directory
dir /var/lib/redis/6379 #264 line, specify the working directory
appendonly yes #700 line, enable AOF persistence function
/etc/init.d/redis_6379 restart #Restart the redis service
Change two configuration filesslave node
#Modify the configuration file of slave1
because /etc/redis/6379.conf
Bind 0.0.0.0#70 lines, modify the monitoring address to 0.0.0.0 (need to fill in the IP of the physical network card in the production environment)
daemonize yes #137 line, start the daemon process and start it in the background
logfile /var/log/redis_6379.log #172 line, specify the log file directory
dir /var/lib/redis/6379 #264 line, specify the working directory
replicaof 20.0.0.41 6379 #288 line, specify the IP and port of the Master node to be synchronized
appendonly yes #700 line, change to yes to enable AOF persistence function
/etc/init.d/redis_6379 restart #Restart redis
netstat -natp | grep redis #Check whether the master-slave server has established a connection
tail -f /var/log/redis_6379.log
Experimental test:
The master node creates data and whether the slave node is synchronized
Only the master node can read and write, and the slave node can only read.
View master-slave status information:
redis-cli info replication
Sentry mode:
There is a master-slave first and then a sentinel. Based on the master-slave replication, automatic switching of the master node failure is realized.
The principle of sentry mode:
It is a distributed system with sentinels on each node for service monitoring of each redis service between the master and slave structures.
When the master node fails, the slave nodes choose a new master through voting.
Sentinel mode also requires at least three nodes
Structure of sentinel mode:
Sentinel node: monitoring, does not store any data
Data node: Master node and slave node are both data nodes
redis working mechanism:
Sentinel monitors redis nodes, not monitoring sentinels
Each sentinel detects the heartbeat line between the master and slave through the ping command every second.
The master node did not reply within a certain period of time, or replied with an error message. At this time, the Sentinel will subjectively take the task master node offline. If more than half of the sentinel nodes think that the master node is offline, the master node will be considered to be objectively offline at this time.
The sentinel nodes use the raft algorithm (election algorithm), and each node votes together to elect a new master. Then create a new master to implement the escape and failure recovery notification of the master node.
The process of master node election:
- A slave node that has gone offline will not be selected as the master node.
- Select the slave node with the highest priority in the configuration microfile, replica-priority 100
- Select a slave node that replicates the most complete data
Sentry mode, included in the source code package
The master-slave configuration is the same:
because /opt/redis-5.0.7/sentinel.conf
17行
protected-mode no
Uncomment
Turn off protected mode
Line 21
Sentry mode default port
26 lines
Specify whether sentinel mode runs in the background. Change it to yes.
36 lines
is a log file
65 lines
Set working directory:
Line 84
Specify initial master server
The 2 here means that at least 2 servers need to think that the master is offline before the master-slave switch can be performed.
113 lines
Determine the service period of server downtime
Line 146
Maximum timeout for failed nodes
After configuration
Restart service
Start the master first and then the slave
They are all started in the redis original code package directory.
To start in the reids original code package directory,
redis-sentinel sentinel.conf &
Monitoring Sentinel Cluster
View the sentinel status of the entire cluster:
redis-cli -p 26379 info Sentinel
Simulated failure:
Kill the process or pause the node first
Then vote for a master
Test results, new master creates data, slave synchronizes data
Fault recovery, restart the source master server