How do performance testing

Common performance optimization concept


Throughput (TPS, QPS): is simply the number of transactions or number of queries per second completed. More usually show a certain large number of requests per unit of time the system can handle, it is usually desirable higher the better TPS


Response time : the time that is sent out to the receive system returns from the request. After the response time is not averaged, but to remove the unstable again after averaging values, such as the commonly used 90% response time, refers to the 10% response time to remove unstable, the remaining 90% of the stable the average response time. From the point of view of the cluster, in fact, it is to remove outliers.


Error rate : Bad Request i.e. ratio of the number of the total number of requests. As the pressure increases, there may be cases but processing the request processing, then the error number will continue to increase.


Three great relevance, any orphaned data can not explain the problem. A typical relationship is, throughput is increased, the response delay may increase, may also increase the error rate. Therefore, TPS come up with a single 10w and can not explain the problem.


Performance tuning ideas


Generally, there is a need to tune preconditions, namely whether it is under real pressure water line or the line test so that magnify the problem, in evidence.


According to these more obvious initial impression of the phenomenon to the problem, gather evidence to verify the result of the establishment of initial impression, and then analyze the causes of the phenomenon and try to solve the problem.


1. Performance diagnostic test


For systems on new or have had a larger code changes the system, do a diagnostic test or necessary. In general, it is desirable to thoroughly test is a stand-alone stress tests. Stress testing can probably help you figure out how much the system limit TPS, there is no exposure to some errors or problems when the pressure is up, what is the approximate system resource consumption is possible system performance bottlenecks are.


The following is a configuration and diagnostic test results. This is a result of concurrent users to 12,000 pressure measuring machine 10 can be seen, the TPS to multiple 7w, 82MS average response time, the error rate of 2.5%.


From the figure what information can also be obtained? First, the rapid fall in late TPS, in fact, has been unable to support such a large amount of concurrency that collapse into the area, there are several possibilities, one system simply can not afford such a large amount of concurrency, and second intermediate system in question leads to TPS fall. Second, as time increases, the error rate is significantly increased, indicating that the system has been unable to deal with so many requests. And binding the previous two relatively stable average response time, the system can not be inferred generally withstand such a large concurrent. In addition, because it is 10 machines, TPS in a single probably more than 7,000, the future can be tuned as a basis.

For the characteristics of the application, but also to analyze it at this time, that is, application resources might occupy. For example, a CPU-intensive applications or IO intensive applications (can also be broken down into a dense disk or network)







2. Define the goal of optimizing performance


Often hear people say, to be a performance optimization, throughput, the higher the better; or do a performance test, TPS target is 50000. Can actually get this information, you can do performance testing? The goal clear enough yet?


In fact, in my opinion, is not defined clear goals to do performance testing bullying.


Performance optimization target throughput is generally much, much less than the 90% response time, the error rate is less than the number. But also we need to focus on other performance indicators, cpu usage, memory usage, disk usage, bandwidth usage and so on. For diagnostic testing has found that the problem can be specifically optimized for this problem, such as high load, cpu consumption is too large, the target may be TPS, reducing the CPU load case response times and error rates unchanged. Or memory growing too fast, gc more frequent, the goal may be to identify possible memory leaks, or for related jvm memory tuning. In short, the goal can be more flexible to adjust, but it must be clear.


3. Analyze


Process analysis is more flexible, the system is basically a one thousand thousand kinds of performance. Here it is difficult them out. Only talk about some common methods, tools and ideas.


For CPU:


Cpu for monitoring, in fact, linux has provided two relatively easy to use tool, one top, one is vmstat. About two commands not elaborate, reference here top, vmstat cpu focuses on four values: us (user), sy (system), wa (wait), id (idle). In theory they should add up to 100%. And each of the first three values ​​are too high may indicate the presence of certain problems.







a. 代码问题。比如一个耗时的循环不加sleep,或者在一些cpu密集计算(如xml解析,加解密,加解压,数据计算)时没处理好

b. gc频繁。一个比较容易遗漏的问题就是gc频繁时us容易过高,因为垃圾回收属于大量计算的过程。gc频繁带来的cpu过高常伴有内存的大量波动,通过内存来判断并解决该问题更好。




a. top命令找到消耗us过高的进程pid

b. top -Hp pid找到对应的线程tid

c. printf %x tid转为16进制tid16

d. jstack pid | grep -C 20 tid16 即可查到该线程堆栈




a. 上下文切换次数过多。通常是系统内线程数量较多,并且线程经常在切换,由于系统抢占相对切换时间和次数比较合理,所以sy过高通常都是主动让出cpu的情况,比如sleep或者lock wait, io wait。




a. 等待io的cpu占比较多。注意与上面情况的区别,io wait引起的sy过高指的是io不停的wait然后唤醒,因为数量较大,导致上下文切换较多,强调的是动态的过程;而io wait引起的wa过高指的是io wait的线程占比较多,cpu切换到这个线程是io wait,到那个线程也是io wait,于是总cpu就是wait占比较高。




a. 很多人认为id高是好的,其实在性能测试中id高说明资源未完全利用,或者压测不到位,并不是好事。




关于java应用的内存,通常只需要关注jvm内存,但有些特殊情况也需要关注物理内存。关于jvm内存,常见的工具有jstat, jmap, pidstat, vmstat, top。




异常gc :


a. 通常gc发生意味着总归是有一块区域空间不足而触发gc。而许多导致异常gc的情况通常是持有了不必要的引用而没有即时的释放,比如像cache这样的地方就容易处理不好导致内存泄露引发异常gc。

b. 有可能是程序的行为是正常的,但是由于没有配置对合适的gc参数导致异常gc,这种情况通常需要调优gc参数或者堆代大小参数。

c. Full gc 发生的情况:


  • 永久代满

  • 年老代满

  • minor gc晋升到旧生代的平均大小大于旧生代剩余大小

  • CMS gc中promotion fail或concurrent mode fail




a. OOM经常伴随着异常gc,之所以单独拿出来讲,是因为它的危害更大一些,异常gc顶多是收集速度过快或者回收不了内存,但是起码有个缓冲时间,但是出了OOM问题就大了。至于各种类型的OOM如何区分,如何发生,请参考这里(,算是总结得比较全面的。对于常见的OOM,基本上可以一下子指出问题所在。

b. heap区,对象创建过多或持有太多无效引用(泄露)或者堆内存分配不足。使用jmap找到内存中对象的分布,使用ps找到相应进程及初始内存配置。

c. stack区, 不正确的递归调用。

d. perm区,初始加载包过多,分配内存不足。

e. 堆外内存区,分配ByteBuffer未释放导致。




IO分为网络IO和文件IO,针对网络IO比较有用的工具有sar(,netstat(,netstat是一个非常牛逼的命令,可以助于排查很多问题, 针对文件io的工具有pidstat,iostat(




a. 从技术上来说,对于大文件IO可以采取的措施是异步批处理,采用异步方式用于削峰并累计buffer,采用批处理能够让磁盘寻道连续从而更加快速。






a. 大量TIME_WAIT。根据TCP协议,主动发起关闭连接的那一方,关闭了自己这端的连接后再收到被动发起关闭的那一方的关闭请求后,会将状态变为TIME_WAIT,并等待2MSL, 目的是等待自己的回执发送到对方。如果在服务器上发现大量TIME_WAIT,说明服务器主动断开了连接,什么情况下服务器会主动断开连接,很可能是客户端忘了断开连接,所以一个典型的案例就是jdbc连接忘记关闭,则数据库服务器可能会出现大量的TIME_WAIT状态。


b. 大量CLOSE_WAIT。CLOSE_WAIT状态,在收到主动关闭连接的一方发出关闭连接之后,被动关闭的一方进入CLOSE_WAIT状态,如果这时候被hang住了没进行后续关闭,则会出现大量CLOSE_WAIT。啥情况会被hang住呢,举几个例子,比如刚刚的忘记关闭数据库连接,在应用服务器这端,大量的浏览器请求进来,由于没有连接池连接被hang住,这时候浏览器等待一定时间超时发送关闭连接请求,而应用服务器这边由于servlet线程被hang住了,自然没有办法走第二个关闭回去。因此在应用服务器出现大量CLOSE_WAIT。另一个例子是httpClient的坑,在调用response.getEntity(); 前都不会做inputStream.close(),如果在调用response.getEntity()前就返回了,就狗带了。(这个例子可以参考




  • 性能调优思路

  • linux下性能监控命令

  • 关于JVM CPU资源占用过高的问题排查

  • java排查工具

  • jvm参数调优

  • java linux系统调优工具

  • gc优化的一些思路

  • 性能优化的思路和步骤

  • 性能调优攻略

  • JVM性能调优入门

  • JVM性能调优

  • Tomcat性能优化


Guess you like