Performance testing should be how to do?

Ali happened to see the middleware performance test report Dubbo's , I think this performance test reports do people think this performance testing performance testing of people do not understand, I think this report will go public with the ditch, so, I want to write this article, do a little science.

First of all, this test report was the main problem as follows:

1) using the average of all . Honestly, the average is do not fly.

2) the response time and throughput without TPS / QPS hook . But only tested the low rate, which is completely wrong.

3) response time and throughput and the success rate is not linked.

 

Why do not fly the mean

Why not fly on the average, I believe when we read the news can often see, the average wage , the average price , average spending , and so on such words, you'll know why the average shy. (These are math games, for science and engineering students, there should be a natural immunity)

Software performance testing is the same, the average is do not fly, where you can see this detailed article " Why Averages Suck and Percentiles are Great ", I am here to simply say something.

We know that when performance testing, test result data obtained are not always the same, but there are high and low, if you count the average would be the case if, tested 10 times, 9 times is 1ms, and once there are 1s, then the average data is 100ms, it is clear that it fails totally to react performance test case, perhaps it is the 1s requests an abnormal value, a noise, should be removed. So, we'll see some of the judges scoring want to remove a highest score a minimum score, then calculate the average.

Further, the median (on Mean) may be slightly more than the average number of fly, a so-called median is intended to be arranged in order of size, at a minimum number of intermediate position called the set of data in a set of data digits, which means that at least 50% of the data below or above the median.

Of course, the most correct approach is statistical distribution statistics as a percentage. English is the TP - Top Percentile, TP50 in the meaning of 50% less than a value of a request, a request TP90 represents less than 90% of a certain time.

For example: We have a set of data: [10ms, 1s, 200ms, 100ms], we put it in ascending row number order: [10ms, 100ms, 200ms, 1s], so we know, TP50, is 50% of the requests ceil ( 4 * 0.5) = 2 time of less than 100ms, TP90 is 90% request ceil (4 * 0.9) = 4 time of less than 1s. So: TP50 is 100ms, TP90 is 1s.

I used to do in Reuters financial performance testing system response time requirements are such that 99.9% of the requests must be less than 1ms, all the average time must be less than 1ms. Limit two conditions.

Why response time (latency) and throughput to (Thoughput) Hook

If you look at the throughput performance of the system, do not look at the response time is meaningless. My system could top 100,000 request, but the response time has come to five seconds, such a system has been unavailable, so throughput is of no significance.

We know that when the amount of concurrency (throughput) increases, the system will become increasingly unstable, fluctuating response time will be growing, the response time will become more and more slowly, but also throughput more stagnant (shown below), including CPU usage cases will be the case. So, when the system becomes unstable, throughput has no meaning. Throughput meaningful only when the system is stable when the time.

BenchmarkOptimalRate

Therefore, the value of the throughput required to have a response time of the card. For example: TP99 time of less than 100ms, the maximum number of concurrent system may be carried 1000qps . This means that we have to keep on a different number of concurrent tests to find the maximum throughput of the most stable software.

 

Why throughput response time and success rate to hook

That we should not be difficult to understand, if the request is not successful, we have also made fur performance testing. For example, I said that my system up to 100,000 concurrent, but the failure rate is

40%, then this 100,000 concurrent entirely a joke.

Tolerate failure rate performance test should be very low. For some critical systems, the number of successful requests must be 100%, that can not be vague.

 

How do rigorous performance testing

In general, performance testing should be unified so consider several factors: Thoughput Throughput , Latency response time , resource utilization (CPU / MEM / IO / Bandwidth ...), the success rate , system stability .

The following performance tests basically the way the club sources of appearing self Thomson Reuters, to do a real-time financial data systems company.

First, you have to define a system response time latency, it is recommended that TP99, and the success rate . Definition such as Reuters: 99.9% response time must be within 1ms, the average response time of less than 1ms, 100% of the request succeeds.

Second, in this response time limit, find the highest throughput . Test data, the data needs of various sizes of medium and small, and may be mixed. Preferably test data on the production line.

Third, the throughput do Soak Test, for example: using the second test step 7 to give a certain consecutive days of continuous pressure measurement system. Then collect CPU, memory, hard disk / network IO, and other indicators to see whether the system is stable, for example, CPU is stable, memory usage is stable. Well, this value is the performance of the system

Fourth, find the system's limits. For example: In the case where the success rate of 100% (without considering the length of response time), the system throughput can persist for 10 minutes.

Fifth, do Burst Test. The second step of the throughput obtained with the implementation of five minutes, and then perform the fourth step in the limit get one minute, then back to the second step of the throughput of the implementation of 5 minutes, and then the fourth step execution permission value of 1 minute a period of time and so forth, such as two days. Data collection system: CPU, memory, disk / network IO, etc., they observed curves, and the corresponding response time, ensure that the system is stable.

Sixth, low throughput and network packet tests. Sometimes, low throughput, it may lead to increased latency, such as TCP_NODELAY parameter does not turn would lead to increased latency (see those things of TCP ), and the network packet will lead to dissatisfaction with the bandwidth can lead to performance falters Therefore, the performance test we also need to look at two scenarios based on the actual situation choice test.

Guess you like

Origin www.cnblogs.com/JerryTomcat/p/12532886.html