As a performance optimization, a single 4-core 8G machine supports 50,000 QPS!

Preface

The theme of this article is to record the performance optimization of a Python program, the problems encountered during the optimization process, and how to solve them. To provide you with an optimization idea, the first thing to state is that my method is not the only one. There is definitely more than one solution to the problems that everyone encounters on the road to performance optimization.

How to optimize

First of all, everyone needs to be clear that talking about optimization without requirements is just a hooligan. So if anyone tells you that millions of concurrencies have been achieved on the xx machine, you can basically think that they are pretending to understand. The number of concurrencies is completely meaningless. meaningful. Secondly, before we optimize, we must have a goal and the extent to which we need to optimize. Optimization without a clear goal is uncontrollable. Then, we must clearly find out where the performance bottleneck is, instead of messing around aimlessly.

Description of Requirement

This project is a separate module I was responsible for in my previous company. It was originally integrated into the main site code. Later, because the concurrency was too high, in order to prevent problems from dragging down the main site services, I was responsible for splitting it all by myself. The requirements for splitting this module are that the stress test QPS cannot be less than 30,000, the database load cannot exceed 50%, the server load cannot exceed 70%, the duration of a single request cannot exceed 70ms, and the error rate cannot exceed 5%.

The environment configuration is as follows:
Server: 4-core 8G memory, centos7 system, SSD hard drive
Database: Mysql5.7, maximum number of connections 800
Cache: redis, 1G capacity.
The above environments are all services purchased from Tencent Cloud.
Stress testing tool: locust, which uses Tencent’s elastic scaling to implement distributed stress testing.

The requirements are described as follows:
The user enters the homepage and queries the database to see if there is a suitable pop-up window configuration. If not, continue to wait for the next request. If there is a suitable configuration, Then return to the front end. There are multiple conditional branches here. If the user clicks on the pop-up window, the user click will be recorded and the configuration will not be returned within the configured time. If the user does not click, the configuration will continue to return after 24 hours. If the user clicks , but there is no subsequent configuration, then wait for the next time.

Key Analysis

According to the demand, we know that there are several important points: 1. We need to find out the pop-up window configuration suitable for the user. 2. We need to record the time when the user returns to the configuration next time and record it in the database. 3. We need to record the user’s response to the returned configuration. What operations are performed are configured and recorded in the database.

现在我也找了很多测试的朋友,做了一个分享技术的交流群,共享了很多我们收集的技术文档和视频教程。
如果你不想再体验自学时找不到资源,没人解答问题,坚持几天便放弃的感受
可以加入我们一起交流。而且还有很多在自动化,性能,安全,测试开发等等方面有一定建树的技术大牛
分享他们的经验,还会分享很多直播讲座和技术沙龙
可以免费学习!划重点!开源的!!!
qq群号:310357728【暗号:csdn999】

Tuning

We can see that the above three key points all have database operations, not only reading the database, but also writing the database. From here we can see that if there is no caching, all requests will be pressed to the database, which will inevitably occupy all the connections and cause access denied errors. At the same time, because the SQL execution is too slow, the requests cannot be returned in time. Therefore, the first thing we need to do is to separate the database writing operation, improve the response speed of each request, and optimize the database connection. The architecture diagram of the entire system is as follows:

picture

Put the write operation into a first-in-first-out message queue. In order to reduce the complexity, a redis list is used to do this message queue.

Then perform a stress test, the results are as follows:
QPS is around 6000, 502 errors increase sharply to 30%, server CPU jumps back and forth between 60%-70%, the number of database connections The number of occupied TCP connections is about 6,000. It is obvious that the problem still lies in the database. After troubleshooting the SQL statement, the reason is found to be that the number of connections caused by reading the database for each request when finding the appropriate user configuration operation is being reduced. run out. Because our number of connections is only 800, once there are too many requests, it will inevitably lead to a database bottleneck. Ok, the problem has been found, we continue to optimize, the updated architecture is as follows

picture

We load all configurations into the cache, and only read the database when there is no configuration in the cache.

Next, we performed another stress test, and the results are as follows:
When the QPS reached around 20,000, it stopped working, and the server CPU jumped between 60% and 80%. The number of database connections is about 300, and the number of TPC connections per second is about 15,000.

This problem has been bothering me for a long time, because we can see that our QPS is 20,000, but the number of tcp connections has not reached 20,000. I guess that the number of tcp connections is the problem that causes the bottleneck, but for what reason The cause cannot be found yet.

At this time, I guess, since the TCP connection cannot be established, is it possible that the server has limited the number of socket connections? To verify the guess, let’s take a look. Enter the ulimit -n command in the terminal, and the displayed result is 65535. Seeing this, I think the socket connection is The number is not the reason that limits us. In order to verify the guess, increase the number of socket connections to 100001.

Perform the pressure test again and the results are as follows:

When the QPS reached about 22,000, it couldn't go up. The server CPU jumped between 60% and 80%, the number of database connections was about 300, and the number of TPC connections per second was about 17,000.

Although there was a slight improvement, there was no substantial change. In the next few days, I found that I could not find an optimization solution. It was really uncomfortable during those days, and I could not find an optimization solution. After a few days, I tried again After sorting through the problem, we found that although the number of socket connections was sufficient, not all of them were used. It is speculated that after each request, the TCP connection was not released immediately, resulting in the socket being unable to be reused. After searching for information, I found the problem:

The TCP link will not be released immediately after the four-way handshake ends. Instead, it will be in the timewait state and wait for a period of time to prevent the client's subsequent data from not being received.

Okay, the problem has been found. We need to continue optimizing. The first thing that comes to mind is to adjust the waiting time after the tcp link is completed. However, Linux does not provide adjustment of this kernel parameter. If you want to change it, you must recompile the kernel yourself. Fortunately, there is Another parameter net.ipv4.tcp_max_tw_buckets, the number of timewait, the default is 180000. We adjust it to 6000, then turn on timewait for fast recycling and enable reuse. The complete parameter optimization is as follows

#timewait 的数量,默认是 180000。
net.ipv4.tcp_max_tw_buckets = 6000

net.ipv4.ip_local_port_range = 1024 65000

#启用 timewait 快速回收。
net.ipv4.tcp_tw_recycle = 1

#开启重用。允许将 TIME-WAIT sockets 重新用于新的 TCP 连接。
net.ipv4.tcp_tw_reuse = 1

We stress tested again, and the results show:
QPS50,000, server CPU 70%, database connection is normal, TCP connection is normal, response time is 60ms on average, and error rate is 0%.

Conclusion

At this point, the development, tuning, and stress testing of the entire service are over. Looking back on this tuning, I gained a lot of experience. Most importantly, I deeply understood that web development is not an independent individual, but an engineering practice that combines multiple disciplines such as networks, databases, programming languages, and operating systems. This requires Web developers have solid basic knowledge, otherwise they don’t know how to analyze and find problems when they arise.

ps: Enabling tcp_tw_recycle and tcp_tw_reuse on the server side will cause some problems. We have sacrificed one part for optimization and gained the other part. This is what we need to be clear about.

Finally, I would like to thank everyone who read my article carefully. Looking at the increase in fans and attention, there is always some courtesy. Although it is not a very valuable thing, if you can use it, you can take it directly!

Software Testing Interview Document

We must study to find a high-paying job. The following interview questions are from the latest interview materials from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and some Byte bosses have given authoritative answers. After finishing this set I believe everyone can find a satisfactory job based on the interview information.
 

Insert image description here

Guess you like

Origin blog.csdn.net/jiangjunsss/article/details/134880499