<Transfected> gc analysis and positioning a target performance problems caused by excessive

Phenomenon: an interface running at maximum only 7TPS 4C on the machine, it has been more than 90% CPU utilization.

Positioning:

1, the top command to view CPU usage, find the process ID

 

 

2, top -H -pid command to view process information, see four processes CPU usage is high, add together more than 100%:

 

 

3, see the specific thread information, the first use printf "% x \ n" 6007, will be converted into 16 hex thread ip, the result is 1777.

4, jstack pid | grep pid command, see the specific thread information, print and found that GC threads, one by one analysis of the four occupying high CPU threads, are found just below four threads, so far, the initial identification of performance problems GC is caused.

 

 

5, configured java visualvm, see the GC, the results are as follows, FULL GC problem does not exist, a memory leak problem does not exist, to reduce the problem to the young generation.

 

 

 

6, jstat -gcutil pid command to view specific information gc and found Eden district will probably 5s full time.

 

 

7, see the gc log, you see minor gc with high frequency, the key is a minor gc a long time, user time-consuming to reach more than 500 ms, usually a few millimeters up to several tens of milliseconds normal, so far, the basic problem to locate there is a minor gc, performance problems due to minor gc too frequent and time-consuming due to the initial guess for two reasons, first, because Eden area is too small, the other is because the object is too big, start with a simple investigation, increase Eden area to see:

 

 

8, see the JVM configuration information related to the young generation is basically these parameters and found that small Eden configuration, and waste a little longer, feeling the development configuration is not reasonable, so get rid of the next three JVM parameters, use the default settings, restart the service, to validate the configuration:

 

 

9, after the restart, use jstat command again, found gc frequency reduced by half, but the tragedy is that, gc time doubled, TPS has not changed, regardless of the configuration and JVM far indeed, need to focus on the object size.

 

 

 

10, view the thread information, find deployment-related projects, targeted to specific ways:

 

 

11, find the code, is a select operation, the result is returned to select:

 

 

12、继续定位到具体的SQL:

 

 

13、查看这个SQL返回的结果,有三万多条,至此基本确定问题所在,返回的list过大,导致Eden区很快就满,而且回收缓慢,造成垃圾回收出现问题,同时GC占用大量CPU,导致CPU使用过高,最终就出现了看见的TPS只有7,CPU就满了的问题。

 

 

总结:因为性能测试数据是我们自己造的,第一反应是我们造的数据有问题,再次确认后,发现我们数据没问题,这个查询的where条件传的是课次信息,一个课次有几万学生属于正常数据。正常情况下查这个表时会同时带上学生id,这样的结果不会超过十条,不会存在问题。但是开发为了方便,调用了之前的方法,结果就出现了这样的问题。

反思:其实这个问题是可以通过慢查询日志来定位的,由于我们这个项目用的是阿里云的机器,运维不给配权限,我们只好用MONyog这个工具监控慢查询,而且使用发现,不好用。除此之外,还有经验问题,由于我们数据量不是特别大,百万以下的表居多,按照以往的经验,只要走到索引都不会出现慢SQL,所以很多SQL执行时我都会explain看一下。另一个原因是当时这个项目提交了太多接口,没时间考虑太多,抱着先出个结果的态度进行的压测,此次问题的定位也是在所有接口压测完才去看的,当时看到是由于对象过大引起的性能问题,就想到了之前确实有一个SQL查到了很多数据,通过这次测试,以后在调脚本的时候,需要对SQL的结果进行关注了。

 

PS:其实有另一个方法定位问题,使用jmap -histo:live 10270 >2.txt ,直接看内存的对象,可以直接看到哪个对象大,然后去代码里看这个对象是什么,更直接方便。

 

作者:幻天行 出处:https://www.cnblogs.com/huantianxing/p/8137378.html  本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。

Guess you like

Origin www.cnblogs.com/1737623253zhang/p/11576485.html