【Redis】Talking about the reasons for the slowdown of Redis and troubleshooting solutions

        This article brings you relevant knowledge about Redis . It mainly introduces the reasons for the slowdown of Redis and the troubleshooting methods. The sample code is introduced in great detail in this article, which has a certain reference learning value for everyone's study or work. , Let’s take a look together, I hope it will be helpful to everyone.

Cause 1: The instance memory reaches the upper limit

Troubleshoot ideas

        If your Redis instance sets the memory upper limit maxmemory, it may also cause Redis to slow down.

        When we use Redis as a pure cache, we usually set a memory upper limit maxmemory for this instance, and then set a data elimination strategy. And when the memory of the instance reaches maxmemory, you may find that every time new data is written thereafter, the operation delay becomes larger.

cause of the slowdown

        When the Redis memory reaches maxmemory, before writing new data each time, Redis must first kick out some data from the instance, so that the memory of the entire instance remains below maxmemory, and then new data can be written in.

        The logic of kicking out old data also takes time, and the specific time-consuming length depends on the elimination strategy you configure:

  • allkeys-lru: Regardless of whether the key is set to expire, eliminate the least recently accessed key
  • volatile-lru: only eliminate the least recently accessed key with an expiration time set
  • allkeys-random: Regardless of whether the key is set to expire, the key is randomly eliminated
  • volatile-random: only randomly eliminate keys with an expiration time set
  • allkeys-ttl: Regardless of whether the key is set to expire, eliminate the key that is about to expire
  • noeviction: Do not eliminate any key, after the instance memory reaches maxmeory, write new data and return an error directly
  • allkeys-lfu: Regardless of whether the key is set to expire, eliminate the key with the lowest access frequency (supported by version 4.0+)
  • volatile-lfu: Only eliminate the key with the lowest access frequency and set an expiration time (supported by version 4.0+)

        Which strategy to use depends on the specific business scenario. Generally, the allkeys-lru / volatile-lru elimination strategy is most commonly used. Their processing logic is to randomly take a batch of keys from the instance each time (the number is configurable), and then eliminate a key with the least access, and then remove the remaining The next key is temporarily stored in a pool, and a batch of keys is randomly selected, compared with the keys in the previous pool, and a key with the least access is eliminated. Repeat this until the instance memory falls below maxmemory.

        It should be noted that the logic of redis elimination data is the same as that of deleting expired keys, and it is also executed before the command is actually executed, that is to say, it will also increase the delay of our operation Redis, and the higher the write OPS, the delay will also increase. more obvious.

        In addition, if bigkey is still stored in your Redis instance at this time, it will take a long time to eliminate and delete bigkey to release memory.

        Did you see it? The hazards of bigkey are everywhere, which is why I reminded you not to store bigkey as much as possible.

solution

  • Avoid storing bigkeys and reduce the time-consuming to release memory
  • The elimination strategy is changed to random elimination, which is much faster than LRU (adjusted according to business conditions)
  • Split the instance and distribute the pressure of key elimination to multiple instances
  • If you are using Redis 4.0 or above, enable the layz-free mechanism, and put the operation of eliminating keys to release memory in a background thread (configure lazyfree-lazy-eviction = yes)

Reason 2: Large memory pages are enabled

Troubleshoot ideas

  • We all know that when applications apply for memory from the operating system, they apply for memory pages, and the conventional memory page size is 4KB.
  • Starting from 2.6.38, the Linux kernel supports the memory huge page mechanism, which allows applications to apply for memory from the operating system in units of 2MB.
  • The memory unit that the application applies to the operating system each time becomes larger, but this also means that it takes longer to apply for memory.

cause of the slowdown

  • When Redis is executing background RDB and AOF rewrite, it uses fork sub-process to handle it. However, after the main process forks the child process, the main process can still receive write requests at this time, and the incoming write requests will use the Copy On Write (copy-on-write) method to operate memory data.
  • That is to say, once the main process has data that needs to be modified, Redis will not directly modify the data in the existing memory, but first copy the data in the memory, and then modify the data in the new memory. copy-on-write".
  • Copy-on-write can also be understood as, who needs to write, who needs to copy first, and then modify.
  • The advantage of this is that any write operation by the parent process will not affect the data persistence of the child process (the child process only persists all the data in the entire instance at the moment of fork, and does not care about new data changes, because The child process only needs a memory snapshot, which is then persisted to disk).
  • But please note that when the main process is copying memory data, this stage involves the application of new memory. If the operating system opens the memory large page at this time, then during this period, even if the client only modifies 10B of data, Redis will apply The memory will also be applied to the operating system in units of 2MB, which will take longer to apply for memory, which will increase the delay of each write request and affect Redis performance.
  • Similarly, if the write request operates on a bigkey, when the main process copies the bigkey memory block, the memory requested at one time will be larger and the time will be longer. It can be seen that bigkey affects performance again here.

solution

Disable the memory huge page mechanism.

First, you need to check whether the Redis machine has enabled huge pages:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

If the output option is always, it means that the memory huge page mechanism is currently enabled, and we need to turn it off:

$ echo never > /sys/kernel/mm/transparent_hugepage/enabled

        In fact, the advantage of the memory huge page mechanism provided by the operating system is that it can reduce the number of applications for memory to a certain extent.

        But for Redis, a database that is extremely sensitive to performance and latency, we hope that Redis will take as little time as possible each time it applies for memory, so I don't recommend you enable this mechanism on the Redis machine.

Reason 3: Using Swap

Troubleshoot ideas

        If you find that Redis suddenly becomes very slow, and each operation takes hundreds of milliseconds or even seconds, then you need to check whether Redis uses Swap. In this case, Redis basically cannot provide High performance service too.

cause of the slowdown

What is Swap? Why does the use of Swap cause the performance of Redis to degrade?

        If you have some knowledge about the operating system, you will know that in order to alleviate the impact of insufficient memory on the application, the operating system allows a part of the data in the memory to be swapped to the disk to achieve the buffering of the memory used by the application. These memory data are swapped To the area on the disk, it is Swap.

        The problem is that when the data in the memory is swapped to the disk, when Redis accesses the data again, it needs to read from the disk. The speed of accessing the disk is hundreds of times slower than accessing the memory! Especially for a database like Redis, which has extremely high performance requirements and is extremely sensitive to performance, this operation delay is unacceptable.

        At this point, you need to check the memory usage of the Redis machine to confirm whether Swap is used. You can check whether the Redis process uses Swap in the following ways:

# 先找到 Redis 的进程 ID
$ ps -aux | grep redis-server
 
# 查看 Redis Swap 使用情况
$ cat /proc/$pid/smaps | egrep '^(Swap|Size)'

The output is as follows

Size:               1256 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:              63488 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:              65404 kB
Swap:                  0 kB
Size:            1921024 kB
Swap:                  0 kB
...

This result will list the memory usage of the Redis process.

        Size in each row indicates the size of a piece of memory used by Redis, and Swap under Size indicates how much data has been swapped to the disk for this Size-sized memory. If the two values ​​are equal, it means that the data in this piece of memory has been swapped. Completely swapped to disk.

        If only a small amount of data is swapped to the disk, for example, each piece of Swap accounts for a small proportion of the corresponding Size, the impact is not great. If hundreds of megabytes or even gigabytes of memory are swapped to the disk, then you need to be vigilant. In this case, the performance of Redis will definitely drop sharply.

solution

  • Increase the memory of the machine so that Redis has enough memory to use
  • Organize the memory space, release enough memory for Redis to use, and then release Redis's Swap, so that Redis can reuse the memory

        The process of releasing the Swap of Redis usually needs to restart the instance. In order to avoid the impact of restarting the instance on the business, the master-slave switch is generally performed first, and then the Swap of the old master node is released, and the old master node instance is restarted. Perform master-slave switching.

        It can be seen that when Redis uses Swap, the performance of Redis at this time basically cannot meet the high-performance requirements (you can understand that martial arts are abolished), so you also need to prevent this situation in advance.

        The preventive method is that you need to monitor the memory and Swap usage of the Redis machine, alarm when the memory is insufficient or use Swap, and deal with it in time.

Reason 4: Network bandwidth overload 

Troubleshoot ideas

        If you have avoided the above scenarios that caused performance problems, and Redis has been running stably for a long time, but after a certain point in time, the operation of Redis suddenly starts to slow down, and it continues all the time. What causes it?

        At this time, you need to check whether the network bandwidth of the Redis machine is overloaded, and whether there is a situation where an instance occupies the network bandwidth of the entire machine.

cause of the slowdown

When the network bandwidth is overloaded, the server will experience packet transmission delay and packet loss at the TCP layer and network layer.

        In addition to operating memory, the high performance of Redis lies in network IO. If there is a bottleneck in network IO, it will also seriously affect the performance of Redis.

solution

  • Confirm in time that the Redis instance occupies the full network bandwidth. If it belongs to normal business access, it is necessary to expand or migrate the instance in time to avoid affecting other instances of this machine due to excessive traffic of this instance.
  • At the operation and maintenance level, you need to increase the monitoring of various indicators of the Redis machine, including network traffic, and alarm in advance when the network traffic reaches a certain threshold, so as to confirm and expand in time.

Reason 5: Other reasons

1) Frequent short connections

Your business application should use long connections to operate Redis to avoid frequent short connections.

        Frequent short connections will cause Redis to spend a lot of time on connection establishment and release, and TCP's three-way handshake and four-way handshake will also increase access delay.

2) Operation and maintenance monitoring

I also mentioned earlier that in order to predict the slowdown of Redis in advance, it is essential to do a good job in perfect monitoring.

        In fact, monitoring is to collect various runtime indicators of Redis. The usual method is that the monitoring program collects Redis INFO information regularly, and then performs data display and alarm according to the status data in the INFO information.

        What I need to remind you here is that you should not take it lightly when writing some monitoring scripts or using open source monitoring components.

        When writing monitoring scripts to access Redis, try to use long connections to collect status information to avoid frequent short connections. At the same time, you should also pay attention to controlling the frequency of accessing Redis to avoid affecting business requests.

        When using some open source monitoring components, it is best to understand the implementation principles of these components and configure them correctly to prevent bugs in the monitoring components, resulting in a large number of short-term operations on Redis and affecting Redis performance.

        It happened to us at that time that when the DBA used some open source components, due to configuration and usage problems, the monitoring program frequently established and disconnected from Redis, resulting in slow Redis response.

3) Other programs compete for resources

        The last thing I want to remind you is that your Redis machine should be dedicated and only used to deploy Redis instances, do not deploy other applications, try to provide a relatively "quiet" environment for Redis, and avoid other programs from occupying CPU, memory, and disk resources, resulting in insufficient resources allocated to Redis and being affected.

Summarize:

The following points cause Redis to slow down:

  • Instance memory limit reached
  • Enable Huge Pages
  • Open Swap
  • network bandwidth overload
  • other reasons

Guess you like

Origin blog.csdn.net/gongzi_9/article/details/127004980