Performance test analysis and use

Performance test analysis and use

1. Why performance testing?

The xx system has been successfully released. According to the planning of the previous project, it is planned to serve 1000+ customers. In the future, there will inevitably be a trend of massive growth of information in the business system.
As the system becomes more stable in production, it allows us to focus more on performance issues:

  1. How much data can be tolerated?

  2. What is the bottleneck of the system?

  3. How is the quality of the code?

    These questions need to be answered through performance testing

2. What should be paid attention to?

1. Clarify the purpose of the test

性能测试的目的分为两种:
  • Verification: Verify that the system complies with the relevant performance requirements. Such as: to meet the concurrent requests of 500 people

  • Positioning and tuning: Obtain relevant data through testing, and analyze, position, and tune the data.

2. Determine the test content

  • Clarify business points and priorities

  • According to the priority of function and performance to divide the priority level of business, determine the business to be tested and the untested business

  • Determine relevant performance requirements for different businesses and scenarios: throughput, response time, etc.

3. Understand the performance testing category

  1. Network Level Tests Current Tests

    Throughput and Response Time

  2. operating system testing

    cpu utilization, disk swap rate

  3. Database level testing

    The number of concurrent database connections, the number of lock resources used, and the size of I/O traffic

4. Affected factors

  1. Network bandwidth such as: Ali 50M 100M

  2. The number of servers such as: Aliyun server

  3. Server CPU, memory and other performance such as: Ali's 2G 8G solid state drive, mechanical hard drive

  4. Server OS version

  5. code quality

3. Basic operation

Jmeter uses the document online address: [https://blog.csdn.net/r657225738/article/details/114981779]{.underline}

1. Simple http request

2. Custom User Variables

3. Extract the request data and store it in the file

Scenario: batch addition, batch query
required components: [regular expression extractor->BeanShell post-processing program] Note:
text encoding: utf-8

4. Use the CSV file to read the data

Scenario: batch editing, batch deletion

Required components: [CSV data file setup]

4. Use data to speak

1. Comparison before and after code optimization

问题列表-我的问题

insert image description here


[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-0yyMHUCf-1659577421964)(media/image1.png)] {width="5.751388888888889in" height="1.703472222222223in "}

2. The performance gap between single and double copies

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-H2vrPJ0r-1659577421966)(media/image2.png)]{width="5.760416666666667in" height="2.071527777777778in "}

5. Experience summary

1. Code optimization ideas

View API operation logs through APM link tracking

  • 1. Reduce unnecessary access to the library

    2. Reduce unnecessary data calls

Example: List of Issues - My Issues Before and After Update

Through testing and code tuning, there are two points that need everyone's attention

  • 1. Test tool: Jemter for concurrent testing

    2. Problem location: The specific API finds the optimization code direction through intranet APM link tracking

come to conclusion:

  1. In the concurrency test, API asynchrony will directly improve the performance. If the API is not written asynchronously with 20 concurrency, the throughput may be as low as single digits. After the API is changed to Task asynchronous, with 100 concurrency, the throughput can reach 150-200

  2. Removing unnecessary API access in the code can continue to improve performance, and the performance improvement is obvious. (Related to the number of removed irrelevant API access) In the test, 100 concurrency, the throughput can reach 200-250

2. API error solution during performance test

在保证API地址与参数填写正确的情况下,如还报错则为以下两种情况

1. API report 500

1.1数据库崩溃,卡死,需要重启

2. The API reports 404

2.1所在企业Api服务卡死,需要重启

Six. Little knowledge

6.1 Meaning of parameters in aggregation report

  • Minimum value : refers to the minimum time required to return data after requesting to the server

  • Maximum value : refers to the maximum time required to return data after the request is sent to the server

  • Abnormal value : refers to the proportion of requests that will be disconnected or connected abnormally

  • TPS (throughput) : refers to how many requests can be processed per second, such as 120/sec, that is, 120 network requests can be processed per second. 20/min, that is, per minute

  • Average : Refers to the average of the sum of all request response times

  • 99% : It means that 99% of the requests are below this range, such as 354ms / 99%, that is, 99% of the requests complete the response before 354ms

6.2 Increase the role of Replicated (copy) /Scale (scale)

1. Understand by yourself: Docker will spend less time creating a new instance on another node when a failure occurs, which helps to improve performance and reduce failure rates. For example, there are currently three copies (replicated), when one of the copies ( Server) will be automatically transferred to an available copy when it cannot bear it.

2. Professional explanation: Let's take a copy containing a single instance as an example. Now, suppose there is a failure. Docker Swarm will notice that the service failed and restart it. The service will restart, but the restart is not instant. Let's say it takes 5 seconds to reboot. During these 5 seconds, your service is unavailable. single point of failure.

What if your replica contains 3 instances. Now, when one of them fails (no perfect service), Docker Swarm will notice that one of the instances is unavailable, and create a new one. During this time, you still have 2 healthy instances serving requests. For users of your service, there appears to be no downtime. This component is no longer a single point of failure.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-FgJWL6sr-1659577421968)(media/image3.png)]{width="5.754166666666666in" height="0.6125in "}

6.3 Scale scheduling rules

将node1宕机后或将node1的docker服务关闭,那么它上面的task实例就会转移到别的节点上。当node1节点恢复后,它转移出去的task实例不会主动转移回来,只能等别的节点出现故障后转移task实例到它的上面。使用命令\"docker node ls\",发现node1节点已不在swarm集群中了。

6.4 Popular meaning of high concurrency

For example, let's say you open a snack bar.

You have 3 cashiers (web request processing threads), 5 chefs (database connections).

In the beginning, the business didn't last long. There are fewer guests. 1 cash register can handle it. Others are still idle.

Later, the business got better and better. So all 3 cash registers are working (concurrently).

As a result, after a while, an Internet celebrity came, and suddenly the store became popular, and a large group of people (high concurrency) came here admiringly.

As a result, the 3 cashiers had a long queue with big names in front of them (requests were blocked and queued), and the cashiers were not robots, and sometimes they would make mistakes in the order (concurrent exceptions).

Looking at the chef in the back, there are more and more orders. The chef has been cooking and cooking, and he is getting more and more tired, and the cooking is getting slower and slower (the pressure on the database is high, and the request response time is getting longer).

The slower the dishes come out, the more people will be queuing behind (a vicious circle after the request is blocked).

Some customers left without waiting (the service was overtime), and even gave you a bad review, which affected their mood (affected the upper-level service).

As for the chef behind, after making N dishes, he finally couldn't handle it anymore, and couldn't do it anymore (the database connection was exhausted).

So the small shop can only suspend picking up customers (the server rejects the request and reports error 502).

6.5 Common meaning of throughput rate and concurrent number

Case 1: Throughput rate and concurrency are two completely independent concepts. Take the bank counter as an example. Concurrent counts refer to how many people are rushing to the bank counter at the same time. Throughput rate refers to how many people the bank counter can serve in a period of time

Case 2: One faucet is turned on for a day and one night, and 10 tons of water flow out; 10 faucets are turned on for 1 second, and 0.1 tons of water flow out. Of course the throughput of one tap is large. Can you say that the water output capacity of 1 faucet is better than that of 10 faucets? Therefore, we need to add the unit time to see who has the most water in 1 second. This is the throughput rate of 0.1 tons of water/second
, error 502).

Guess you like

Origin blog.csdn.net/r657225738/article/details/126153559