Remember once near misses JVM optimized experience!

background

There are two sets of production environments Ali cloud servers, are purchased during the same period, CPU, memory, hard disk configuration the same. Specific configuration is as follows:


Since these two server hardware and software configurations the same and run the same program, so Nginx polling policies are weight = 1, namely, a flow platform split between these two machines.

Once, when a routine check of the system, using PinPoint view the server "Heap Usage" usage, found that there is a Full GC system very frequently, about five minutes Full GC, startled me.

So frequent Full GC, cause the system to pause processing business, the availability of real-time system is greatly reduced.

I checked the Tomcat (Tomcat8.5.28) configuration, found in tomcat does not make any settings on JVM memory, all using the default mode.

Since these two server hardware and software configurations the same and run the same program, so Nginx polling policies are weight = 1, namely, a flow platform split between these two machines.

GC data

During peak traffic, "Heap Usage" PinPoint observation by A, B, node usage, respectively, following data period.

FIG 3 hours:


Figure B on the system within three hours, a total of 22 times occurred Full GC, about once every 8 minutes Full GC.

Full GC every time there are about 150ms, namely B system within three hours, about 3300ms suspend system operation.

From the map view, the maximum stack space about 890m, but the size of the heap space 200M about the Full GC occurred, considered from the perspective of the use of system resources, the usage rate is too low.


A chart on the system within three hours, a total of 0 times occurred Full GC, ah, that is, without any pause.

In three hours, the system has been dealing with business, there is no pause. The total heap space is about 1536m, the current stack space is greater than 500M.

6 hours chart:


FIG like the B system, a total of six hours Full GC occurred N times in 6 hours, 3 hours, and statistics, are heap space is less than 200M of Full GC occurred.


A system in figure 6 hours, a total of 0 times occurred Full GC, outstanding.

12 hours


Figure B on the system within 12 hours, a total of N times occurred Full GC, left Full GC is relatively small, because our business is concentrated during the day, although during the peak evening is a non-business, or have a Full GC.


A system in figure 12 hours, a total of 0 times occurred Full GC, outstanding.

GC logs

Look gc.log file, because we both servers output a detailed log of gc, look under the Full GC logs B system.


FIG on all "[Full GC (Ergonomics)" log, as it has been removed "GC (Allocation Failure" log, which is more convenient to observe and analyze the log.

Finally, we choose a Full GC GC log log file.

2018-12-24T15:52:11.402+0800: 447817.937: [Full GC (Ergonomics) [PSYoungGen: 480K->0K(20992K)] [ParOldGen: 89513K->69918K(89600K)] 89993K->69918K(110592K), [Metaspace: 50147K->50147K(1095680K)], 0.1519366 secs] [Times: user=0.21 sys=0.00, real=0.15 secs]

The following information can be calculated:

  • Heap size : 110592K = 108M
  • Old generation Size : 89600K = 87.5M
  • New Generation Size : 20992K = 20.5M

Analysis : The Full GC is because the size of the old space occupied by the object's has exceeded the capacity of the old year triggered Full GC.

[ParOldGen: 89513K->69918K(89600K)]

The reason is because the space allocated to old age is too small, the system can not meet the needs of the business.

This results in space is often filled old age, old age full of space, resulting in Full GC. And because old age is relatively small space, so every time Full GC time is relatively short.

A system log, only 2 Full GC, GC 2 times this occurred when the system starts:

7.765: [Full GC (Metadata GC Threshold) [PSYoungGen: 18010K->0K(458752K)] [ParOldGen: 15142K->25311K(1048576K)] 33153K->25311K(1507328K), [Metaspace: 34084K->34084K(1081344K)], 0.0843090 secs] [Times: user=0.14 sys=0.00, real=0.08 secs]

You can get the following information:

  • Heap size : 1507328K = 1472M
  • Old generation Size : 89600K = 1024M
  • New Generation Size : 20992K = 448M

Analysis : A system that only the system starts to appear secondary Full GC phenomenon, but also a "Metadata GC Threshold" cause, rather than the cause of Full GC heap space.

Even after a week's observation, A system is not Full GC, but when Full GC will occur a long time.

Other systems have been found, the 1024M years old, Full GC duration is about 90ms seconds.

So see that push is not the bigger the better, or at UseParallelOldGC collector, heap space is not the bigger the better.


Analysis and Optimization

Overall analysis:

  • Full GC B system too often, because of the old generation is only about 108M space, can not meet the system memory space during the peak period of demand
  • Since ParOldGen (years old) are often exhausted, Full GC event occurs
  • A system of initial heap space (Xms) and maximum heap (Xmx) are 1536m, fully meet the business needs of the peak of memory

Optimization Strategy:

  • B system to increase the heap size, i.e., by providing the Xms, Xmx value increases heap space. Directly to the Xms and Xmx are set to 1024M.
  • Start heap space (Xms) directly to the cause maximum heap is: because directly Xms set to the maximum (Xmx) kept apply JVM memory can be avoided when running, but directly allocated when the system starts well, in order to improve the efficiency of the system.
  • To the Xms (heap size) to 1024M, because the use of the JDK proposal, which was obtained by the command:
  • java -XX:+PrintCommandLineFlags -version
  • JVM down-parameter system B is set as follows:
  • export JAVA_OPTS="-server –Xms1024m -Xmx1024m -XX:+UseParallelOldGC -verbose:gc -Xloggc:../logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
  • A JVM system parameters remain unchanged, to observe the operation of the system, namely:
  • export JAVA_OPTS="-server -Xms1536m -Xmx1536m -XX:+UseParallelOldGC -verbose:gc -Xloggc:../logs/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps


  • The JVM parameters A, B node of the system using two sets of parameters, in order to verify the parameters of A or B is more suitable for the actual situation.

Welcome to work one to five years of Java engineer friends to join Java programmers: 721 575 865

Java architecture to provide free learning materials within the group (which has high availability, high concurrency, high performance and distributed, Jvm performance tuning, Spring Source, MyBatis, Netty, Redis, Kafka, Mysql, Zookeeper, Tomcat, Docker, Dubbo, more knowledge of information architecture Nginx, etc.) rational use of every minute of their own time to enhance their learning, do not use the "no time" to hide his ideological laziness! Young, hard fight, give an account of their own future!


Reproduced in: https: //juejin.im/post/5d0b568e6fb9a07ee742e535

Guess you like

Origin blog.csdn.net/weixin_33682719/article/details/93176829