Practical methods for performance problem analysis and troubleshooting

Some students of Knowledge Planet encountered a performance problem. The problem is as follows: static resources are placed in Nginx, and the size of the resources is about ten M. Nginx is deployed with docker. During the pressure test, it was found that the loading of static resources was very slow. Ask in the group how to troubleshoot and analyze.

This is a very common performance problem, which is usually caused by insufficient resources such as bandwidth and memory. Of course, the analysis of performance problems cannot be drawn arbitrarily based on guesswork and experience, but should be analyzed and checked with engineering thinking, and finally optimized and verified .

In this article, combined with my own experience, I will talk about the methods of performance problem analysis and troubleshooting in practice.

Performance problem analysis chain

Let’s take a look at the mind map below, which is the analysis method I often use when encountering performance problems in my work. I call it the analysis chain.

As shown in the figure above, the analysis chain should look like this (the data is for reference only):

  • Observe the performance of the problem : test environment, service configuration 2C4G, 20-200 concurrency increments, concurrency reaches 100, RT soars, 20% wrong requests;

  • Find evidence links : Find out where there is a problem, such as full bandwidth, 100% memory usage, a large number of request timeouts, error reporting, and abnormal stacks;

  • Analyze the cause of the problem : Why do these problems occur? The general analysis is top-down, that is, script-data-scenario-configuration-code-system architecture;

  • Performance optimization verification : Use monitoring and logs to quickly find possible causes by elimination method (requires rich experience as a base), and then debug and verify guesses. If there is no problem, modify the problem and re-test and verify it, and observe the monitoring and logs in time to confirm that the problem has been resolved;

Performance analysis practice case

Taking the question of the student at the beginning of the article as an example, how should we analyze it?

First of all, static files will be loaded in this stress test scenario. Our common static resources mainly include pictures or some front-end pages; secondly, if the resource size is 10+M, it can be assumed that this static resource is a picture or a short video; in the problem description It is mentioned that Nginx is deployed with docker, and static resources are mounted on Nginx, so the storage resources of Nginx need to be considered, why?

During the stress test, multiple requests from different users are generally simulated to access the URL. If different images are returned for each request and the concurrency is relatively high, the IO pressure of the service will be relatively high. There is another situation that needs to be considered, that is, the network bandwidth resources between the stress test cluster and the service under test. If the bandwidth is only 100M, the theoretical peak value of the actual transmission efficiency is only 12.5M/S. In this scenario, there will be a problem: even if the concurrency is high, its actual TPS may be <=1.

The mistake that many test students often make when performing performance tests is to ignore the actual business scenarios and the configuration and network bandwidth of the service under test, and brainlessly simulate high concurrent requests. This is neither scientific nor reasonable. In fact , specific business scenarios and the configuration of the service under test should be considered, and then scripts should be designed , such as the common seckill scenario of e-commerce business. At this time, high concurrency can be simulated.

In the above problems, there are two points to consider: the first is to mount relatively large static resources on Nginx, and a more reasonable technical solution should be static resources or large files, and use special file storage services, such as pictures It can be stored in CND. The second point is also easily overlooked. In order to improve performance, it is best to compress the file before transmitting it, and then decompress it at the display layer. Of course, the premise of this is that there is no high-resolution requirement for the image.

Many articles on the Internet introduce how to use stress testing tools, how to prepare test data, and how to simulate concurrency skills, but in my opinion these are just means. The most important part of performance testing is the performance requirement analysis stage. In the analysis stage, the characteristics of the business scenario under test should be taken into consideration as much as possible, whether the system architecture and technical implementation scheme behind it are reasonable, and whether there are potential performance bottlenecks. Pressure testing is only a means of verification, not the purpose of verification .

In recent years, everyone has been talking about shifting testing to the left. In addition to quality built-in and quality access control, the analysis and evaluation of the requirements stage, and the preparation of a bottom-up strategy are actually more important.

Finally: The complete software testing video tutorial below has been sorted out and uploaded, and friends who need it can get it by themselves [Guaranteed 100% free]

Software Testing Interview Documentation

We must study to find a high-paying job. The following interview questions are the latest interview materials from first-tier Internet companies such as Ali, Tencent, and Byte, and some Byte bosses have given authoritative answers. Finish this set The interview materials believe that everyone can find a satisfactory job.

Guess you like

Origin blog.csdn.net/wx17343624830/article/details/132667615