Redis-based design and practice of Tongcheng Phoenix cache system (transfer)

 

This article is excerpted from "In-depth Distributed Cache". The pits that Tongcheng stepped on in the process of using Redis mentioned in this article are very real and very informative. After reading the article, Mark is as follows:

 

1. Redis master-slave + Keepalived scheme, there are problems!

This would have been a good solution, but ignored the situation where the primary data node was down. The single-process and single-threaded design of Redis is the cornerstone of its simplicity and stability. As long as the server does not fail, it will not hang under normal circumstances. But at the same time, the single-process and single-threaded design will cause Redis to be busy with computation and stop responding when it receives complex instructions. It may be because of an instruction such as zset or keys that Redis takes a little longer to calculate, and Keepalived considers it to stop responding. , directly change the point of the virtual IP, and then do a master-slave switch . After a while, commands such as zset and keys will be sent from the client again, so the slave machine starts to block again, and Keepalived keeps switching IPs between the master and slave machines. Finally, both the master node and the slave node were blocked. After Keepalived found it, the virtual IP was released directly, and then all clients could not connect to Redis , and they had to wait for the operation and maintenance to be manually bound online.

 

2. RDB data is placed on the disk, there are problems!

Data placement also causes a lot of problems. RDB is non-blocking persistence. It will create a subprocess to write the data in memory into the RDB file, and the main process can handle the command request from the client. But the data in the child process is equivalent to a copy of the parent process, which is equivalent to two Redis processes of the same size running on the system (?) , which will cause a significant increase in memory usage. If the RDB configuration is performed when the server memory itself is relatively tight, the memory occupancy rate will easily reach 100%, and then virtual memory and disk swapping are turned on, and then the service performance of the entire Redis will plummet.

 

3. Network turbulence between master and slave Redis, there are problems!

There is a little turbulence in the network between the master and slave Redis. Think about such a big thing in the master-slave synchronization, once the network turbulence, what will happen? If the master-slave synchronization fails and the synchronization fails, the full synchronization is directly turned on, so the 200GB Redis starts full synchronization in an instant, and the network card is instantly full. In order to ensure that Redis can continue to provide services, the operation and maintenance students directly turn off the slave machine, the master-slave synchronization does not exist, and the traffic returns to normal. However, the master-slave backup architecture has become a stand-alone Redis, and the heart is still hanging. As the saying goes, good fortune comes, and misfortune does not come singly. Due to the downgrade of the lower layer, the number of concurrent operations in Redis has increased to more than 40,000 per second, and the AOF and RDB libraries obviously cannot handle it. In order to ensure the continuous provision of services, the operation and maintenance students also turned off the data persistence of AOF and RDB. Even the last protection is gone (in fact, this protection is useless, the 200GB Redis recovery is too large).

 

4. Problems caused by abusing Redis commands!

Thousands of clients are mounted on Redis, the concurrency is tens of thousands per second, and the single-core CPU usage of the system is close to 90%. At this time, Redis has begun to be overwhelmed. A program writes a 7MB log to this log component, so Redis is blocked. Once blocked, thousands of clients will not be able to connect, and all logging operations will fail.

 

Therefore, practical experience, standardized operations, and pits to be stepped on are essential for operation and maintenance. For developers, understanding of middleware principles and good service/interface architecture design must also be in place!

 

Design and practice of Tongcheng Phoenix cache system based on Redis"

http://www.sohu.com/a/212424883_494947

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326247080&siteId=291194637