Remember once near misses JVM optimized experience

background

There are two sets of production environments Ali cloud servers, are purchased during the same period, CPU, memory, hard disk configuration the same. Specific configuration is as follows:

Remember once near misses JVM optimized experience

 

Since these two server hardware and software configurations the same and run the same program, so Nginx polling policies are weight = 1, namely, a flow platform split between these two machines.

Once, when a routine check of the system, using PinPoint view the server "Heap Usage" usage, found that there is a Full GC system very frequently, about five minutes Full GC, startled me.

So frequent Full GC, cause the system to pause processing business, the availability of real-time system is greatly reduced.

I checked the Tomcat (Tomcat8.5.28) configuration, found in tomcat does not make any settings on JVM memory, all using the default mode.

Since these two server hardware and software configurations the same and run the same program, so Nginx polling policies are weight = 1, namely, a flow platform split between these two machines.

GC data

During peak traffic, "Heap Usage" PinPoint observation by A, B, node usage, respectively, following data period.

FIG 3 hours:

 

Remember once near misses JVM optimized experience

 

 

Figure B on the system within three hours, a total of 22 times occurred Full GC, about once every 8 minutes Full GC.

Full GC every time there are about 150ms, namely B system within three hours, about 3300ms suspend system operation.

From the map view, the maximum stack space about 890m, but the size of the heap space 200M about the Full GC occurred, considered from the perspective of the use of system resources, the usage rate is too low.

 

Remember once near misses JVM optimized experience

 

 

A chart on the system within three hours, a total of 0 times occurred Full GC, ah, that is, without any pause.

In three hours, the system has been dealing with business, there is no pause. The total heap space is about 1536m, the current stack space is greater than 500M.

6 hours chart:

 

Remember once near misses JVM optimized experience

 

 

FIG like the B system, a total of six hours Full GC occurred N times in 6 hours, 3 hours, and statistics, are heap space is less than 200M of Full GC occurred.

 

Remember once near misses JVM optimized experience

 

 

A system in figure 6 hours, a total of 0 times occurred Full GC, outstanding.

12 hours

 

Remember once near misses JVM optimized experience

 

 

Figure B on the system within 12 hours, a total of N times occurred Full GC, left Full GC is relatively small, because our business is concentrated during the day, although during the peak evening is a non-business, or have a Full GC.

 

Remember once near misses JVM optimized experience

 

 

A system in figure 12 hours, a total of 0 times occurred Full GC, outstanding.

GC日志

看下gc.log文件,因为我们两台服务器都输出了gc的详细日志,先看下B系统的Full GC日志。

Remember once near misses JVM optimized experience

 

上图全部是” [Full GC (Ergonomics)”日志,是因为已经去掉” GC (Allocation Failure”日志,这样更方便观察和分析日志。

我们选取GC日志文件最后一条Full GC日志。

2018-12-24T15:52:11.402+0800: 447817.937: [Full GC (Ergonomics) [PSYoungGen: 480K->0K(20992K)] [ParOldGen: 89513K->69918K(89600K)] 89993K->69918K(110592K), [Metaspace: 50147K->50147K(1095680K)], 0.1519366 secs] [Times: user=0.21 sys=0.00, real=0.15 secs]

可以计算得到以下信息:

  • 堆的大小:110592K=108M
  • 老生代大小:89600K=87.5M
  • 新生代大小:20992K=20.5M

分析:这次Full GC是因为老年代对象占用的空间的大小已经超过老年代容量 引发的Full GC。

[ParOldGen: 89513K->69918K(89600K)]

究其原因,是因为分配给老年代的空间太小,远远不能满足系统对业务的需要。

这导致老年代的空间常常被占满,老年代的空间满了,导致Full GC。而由于老年代的空间比较小,所以每次Full GC的时间也比较短。

A系统日志,只有2次Full GC,这2次GC均发生在系统启动时:

7.765: [Full GC (Metadata GC Threshold) [PSYoungGen: 18010K->0K(458752K)] [ParOldGen: 15142K->25311K(1048576K)] 33153K->25311K(1507328K), [Metaspace: 34084K->34084K(1081344K)], 0.0843090 secs] [Times: user=0.14 sys=0.00, real=0.08 secs]

可以得到以下信息:

  • 堆的大小:1507328K=1472M
  • 老生代大小:89600K=1024M
  • 新生代大小:20992K=448M

分析:A系统只有系统启动才出现二次Full GC现象,而且是” Metadata GC Threshold”引起的,而不是堆空间引起的Full GC。

虽然经过一个星期的观察,A系统没有Full GC,但一旦发生Full GC时间则会比较长。

其它系统曾经发现过,1024M的老年代,Full GC持续的时间大约是90ms秒。

所以看得出来推也不是越大越好,或者说在UseParallelOldGC收集器中,堆的空间不是越大越好。

 

分析与优化

总体分析:

  • B系统的Full GC过于频繁,是因为老生代只有约108M空间,根本无法满足系统在高峰时期的内存空间需求
  • 由于ParOldGen(老年代)常常被耗尽,所以就发生Full GC事件了
  • A系统的堆初始空间(Xms)和堆的最大值(Xmx)均为1536m,完全可以满足业务高峰期的内存需求

优化策略:

  • B系统先增加堆空间大小,即通过设置Xms、 Xmx值增加堆空间。直接把Xms和Xmx均设置为1024M。
  • Start heap space (Xms) directly to the cause maximum heap is: because directly Xms set to the maximum (Xmx) kept apply JVM memory can be avoided when running, but directly allocated when the system starts well, in order to improve the efficiency of the system.
  • To the Xms (heap size) to 1024M, because the use of the JDK proposal, which was obtained by the command:
  • java -XX:+PrintCommandLineFlags -version
  • JVM down-parameter system B is set as follows:
  • export JAVA_OPTS="-server –Xms1024m -Xmx1024m -XX:+UseParallelOldGC -verbose:gc -Xloggc:../logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
  • A JVM system parameters remain unchanged, to observe the operation of the system, namely:
  • export JAVA_OPTS="-server -Xms1536m -Xmx1536m -XX:+UseParallelOldGC -verbose:gc -Xloggc:../logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

 

  • The JVM parameters A, B node of the system using two sets of parameters, in order to verify the parameters of A or B is more suitable for the actual situation.

Guess you like

Origin blog.csdn.net/weixin_45132238/article/details/93886737