Remember an online GC tuning

I recently completed a jvm tuning of the online system, and wrote a blog to record the tuning process and the comparison of data before and after tuning. If there is an error, please privately write me to point it out.

1. Tuning background

      1. Recently, I have been receiving feedback from users that the system often freezes, and I have not found relevant abnormal information after monitoring the background log. I guess it is a GC problem.

Second, the tuning process

      Since the project is a ZF project, the number of users at night and weekends is relatively small, and the debugging time is in these time periods!

      1. First, configure the remote connection parameters on the server side to facilitate the connection of local tools to view the specific project operation. The parameters are as follows

Added to this file catalina.sh

         JAVA_OPTS="$JAVA_OPTS -Djava.rmi.server.hostname=xxx.xx.xx(跟服务器IP) -Dcom.sun.management.jmxremote"
         JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.port=8550"
         JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
         JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.ssl=false"
         JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.authenticate=false"

        Locally connect remotely through jconsole. In the memory column, it is found that the maximum heap memory of the server is only 8G, and the physical memory of our server is 32G. It is guessed that the server is not configured with any GC-related parameters, which are all defaults. If this is the case, then major surgery will be required, change the GC strategy, and adjust the GC parameters. Our JDK is using version 1.8. The default GC strategy should be parallel GC. Considering that the server memory is 32G, we are going to adjust to G1GC to increase the maximum and minimum heap memory at the same time.

        Enter the server and enter jps to view the pid of the system, and then use jmap -heap pid to view the GC-related parameters, which confirmed my conjecture. Parallel GC, maximum heap memory 8G.

        Use jstat -gcutil pid 1000 1000 to view GC records

web server 1:

dubbo server 2:

Due to the distribution, there are 4 servers, two web servers and two dubbo servers have similar GC records, and the FGC of the dubbo server will be a bit less, but the data is obviously abnormal.

2. The first improvement

        Simply change the GC strategy and increase the heap memory: JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC -Xms20g -Xmx20g"

The GC record for one day is as follows

web server 1:

After the dubbo server increased the memory, everything was normal, and no FGC was generated.

3. The second improvement

        The web server has more than 500 FGCs per day. There is no obvious difference between the system and before the adjustment, and there are many instant 503s (recovery immediately), but they are all feedback from the testers, and the user does not respond and is scared. , So optimize again, adjust the parameters as follows: TS="$JAVA_OPTS -XX:+UseG1GC -Xms20g -Xmx20g -XX:MaxGCPauseMillis=100" A new parameter -XX:MaxGCPauseMillis is set to 100 milliseconds for each GCswt, In fact, JVM can't reach 100 milliseconds every time. It can only approach this value. The next day I conducted GC record monitoring in time periods. Since the dubbo server did not have FGC, I did not manage the two servers for the time being. , Only optimize the web two servers. After the adjustment was completed at 10 o'clock that night, the adjusted GC record is as follows

Record at 11 o'clock that night:

Record at 8 am the next day

Record at 10 am the next day

FGC has appeared, and my heart is starting to panic.

Record at 11 am the next day

FGC continues to grow

Record at 4 pm the next day

Record at 11pm the next day

        The total YGC duration and FGC duration of this adjustment have been greatly improved compared to the last time, but during the test there will still be a feedback of 503, and 100 FGCs occurred after 4 o'clock in the evening to 11 o'clock in the evening. This is unbearable and abnormal. Phenomenon, so prepare to add logs and analyze the cause of GC

4. The third improvement

       JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC -Xms20g -Xmx20g -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails   -Xloggc:/data/apache-tomcat-9.0.39/logs/gc.txt" 

I  analyzed the generated log files through https://blog.gceasy.io/ and found that the cause of FGC was triggered by System.gc(), so I asked some bigwigs for advice and suggested that I disable System.gc(). So there is a fourth improvement

5. The fourth improvement

      JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC -Xms20g -Xmx20g -XX:MaxGCPauseMillis=100 -XX:+DisableExplicitGC -XX:+PrintGCDetails   -Xloggc:/data/apache-tomcat-9.0.39/logs/gc.txt" 

Everything was normal the next day, and no 503 occurred. During the period, I checked the GC records many times and no FGC occurred. Only one screenshot was kept, as follows

Record at 4 pm the next day:

It was finally normal. By the way, another dubbo server was also changed to this configuration. After monitoring for more than half a month, FGC did not happen again. This GC tuning was successfully completed.

Three, after tuning

        Remember to turn off GC log printing. It will increase the burden on the server. After receiving server warnings every day for two consecutive weeks, he didn’t think about this. After reading this blog inadvertently, he found that printing GC logs would increase server memory consumption, so The GC log printing was canceled, and the world was finally quiet. This tuning is also over. The following is the recent GC situation. This tuning is only for this project and this server, please adjust other projects according to your own situation. For reference only

web server:

dubbo server:

 

 

Guess you like

Origin blog.csdn.net/syso_alt_hao/article/details/114261752