A typical test press performance tuning process

Performance testing process, not just for performance problems positioning the system under test, analysis, optimization, often responsible for the bulk of the press there is a request to initiate all kinds of performance bottlenecks. After all, with a few thousand dollars to several hundred million machines just want a machine press paralysis is a little a bit of technology.

Here are a typical press performance optimization process, covering the period to disk IO issue, the problem of insufficient CPU, memory exhaustion issue, namely the method parameter adjustment, the code adjustment overcome this.

First, the problem of the press disk busy100%

(A) issue

An important function is to press a shutter, received packets A, B corresponding to the return packet.

Performance testing found that the local queue for receiving packets of jam, received a number of messages after the shutter to a long time to finish processing, namely low-speed packet back to efficiency. For example, the received packets A total of 10 minutes, a common baffle 17 minutes before all the B packet is returned.

(B) positioning analysis

First look at PC Task Manager -> Resource Monitor, starting from the analysis of CPU, memory, network, disk
2.png

2.png

 

3.png

3.png

 

Found that the PC "Disk longest active time" for long periods of 100%

Then see which processes use most disk found is JAVA, JAVA applications, but many need to locate specific to which applications.

In Task Manager, which is seen in the application process, click the process, right-click "open file location"; use the disk to locate the most is MQ program.

(C) resolving

MQ due to excessive write disk IO, the easiest way is to reduce the number of write IO. MQ nothing more than to write the log disk write, write messages, how to reduce the write IO test here require some knowledge of MQ.

1. Circular logging a linear log VS

MQ log, linear and circular logging logs have two of them:
1) even a linear log message content is stored in the log, which queue if the message is deleted, can be restored with a log, i.e. a linear log save more content .
2) does not save the message log cycle
check mq log log is not linear, if it is possible to relieve the log cycle instead.
If it does not work, the next step

2.3 rewritten rewritten VS 1

MQ默认的日志写入是3重写入。改为1重写入
这样可以节省大量Disk IO
如果还没有达到效果,进入下一步

3. 持久消息VS非持久消息

持久消息是要写的磁盘的文件里,且记日志(两次写磁盘)

非持久消息,不记日志,一般不会写到磁盘里,除非mq buffer不够用,放不下当前的消息,才会进入磁盘,(非持久消息只有在发不出去的情况下才丢掉,一般情况不会丢失),即一般情况不写磁盘

对于测试压力机,可以容许异常情况下丢报文,所以可以改为非持久消息,这样,又少了一次写IO。

4. 增大MQ buffer

上面提到“非持久消息,不记日志,一般不会写到磁盘里,除非MQ buffer不够用,放不下当前的消息,才会进入磁盘”。因此增大MQ Buffer也可以在某些压力情况下减少一次写IO。
本场景中,我们调整MQ buffer为10M。

(四) 效果

最后“磁盘最长活动时间”由100%变为10%以下,TPS大幅提高。

二、 压力机CPU 80~90%

(一) 问题

7台压力机,总计预计发送750笔报文/秒,但发现只能发出来500多笔报文/秒。

继续从PC的任务管理器->资源监视器,CPU、内存、网络、磁盘的分析入手。

这次,我们发现每台压力机CPU 80~90%,也就是说,磁盘IO的问题解决之后,CPU又变成了下一个瓶颈。

(二) 分析

这种情况下,一般是代码占用了过多的CPU。
分析代码发现,每一次性能测试工具的迭代(线程被调用),这个线程只处理一个MQ消息,即处理一个MQ消息需要打开、关闭一次MQ队列,这个打开、关闭队列是非常消耗CPU的。

(三) 解决

每次性能测试工具的迭代(线程被调用),让这个线程处理多个MQ消息。

问题来了,并不是每次迭代处理的越多越好,每次处理多少个MQ消息合适?

我们采用了 “当前队列深度”和“指定参数”的最小值。

为什么不是“当前队列深度”:设置为“当前队列深度”即每次迭代处理这个队列中所有的报文。那么如果队列深度很大(比如深度是100),这个线程要顺序处理这些消息,比较慢,其他线程得不到消息去处理,性能测试工具发挥不了并发处理的优势。

如果设置一个参数(比如10),每个线程每次最多处理10个消息,那么其他线程就有机会得到消息去并发处理。
因此,我们采用了 “当前队列深度”和“指定参数”的最小值。

(四) 效果

调整代码之后,预计发送750笔报文/秒,实际也真的发出来750笔报文/秒,不但如此,CPU由80-90%变为了30-60%,节约一半的CPU资源。
TPS提升50%,CPU降低一半,里外里,意味着TPS提升了200%(以前40%的CPU利用率支撑250TPS,现在是750TPS)

三、 场景跑完后,压力机CPU变为100%。

(一) 问题

场景跑的时候压力机 CPU 60-70%,跑完后,压力机CPU变为100%。

(二) 分析

这种情况一定是循环没有设置间隔。

或者是线程里面没有设置间隔,或者是线程的两次迭代之间没有设置间隔。经分析是线程的两次迭代之间没有设置间隔。

为什么最初没有设置间隔呢?因为这段代码的作用是实时抓取到达的报文并处理,如果设置间隔了就不那么实时了。

(三) 解决

设置间隔一定能解决这个问题,那么怎么设置呢?

1) 如果线程的两次迭代之间设置间隔,那么接收报文处理的环节就有一定的延时。不是我们想要的(人为的延长了响应时间)。

2) 在应用线程里面设置间隔。
如果有报文需要处理的时候,不设置时间延迟,实时抓取,没有报文的时候延迟20ms后再次读取队列。

处理方法为,判断队列深度,如果深度为0,则延迟20ms,不为0则不延时。

(四) 效果

修改代码后,场景跑完后CPU自然回落(100%变为10%以下)。

四、 内存耗尽

(一) 问题

PC机(4G内存)执行测试时,只有200M剩余,鼠标键盘操作非常缓慢。

(二) 分析

JVM performance tools are pre-allocated 1.5G, but did not actually use that much
(c) resolving
the performance tools share jvm memory from 1.5G adjusted to 1G.
(Iv) effect
after adjustment, the system is displayed 840M available memory, mouse and keyboard flexible

From the description of the above four questions, analyze and solve can be seen, some of the problems requires knowledge of these products, some issues require some knowledge of code, complete a performance test will encounter various aspects of the performance test must be versatile .

 

Author: Yang Jianxu Source: http: //www.talkwithtrend.com/home/space.php p = blog & t = & uid = 898849 & page = 3 This article belongs to the author and blog Park total, welcome to reprint, but without the author's consent must keep this? paragraph statement, and given the original connection in the apparent position of the article page.

Guess you like

Origin www.cnblogs.com/1737623253zhang/p/11543258.html