Record the troubleshooting process of a Java program exiting abnormally


foreword

Recently, it is in the project development stage. The front end has been reporting that the service cannot be accessed. After each restart, there will be an abnormal exit within a short time. After searching various information on the Internet, the problem of abnormal exit has finally been solved.


1. Abnormal exit information

The Java program runs on the local test server of JDK11. The configuration of the server is very high, with hundreds of gigabytes of memory. After running the program, sometimes it will exit abnormally within a few minutes or less than an hour. Check the output log and see nothing. exception information.

2. Check steps

1. Output dump file

Output the dump file according to the usual processing method:

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./dump .hprof

But when the actual situation crashes, there is no output file

2. Check whether the Linux system actively kills the process

The command is as follows:

egrep -i 'killed process' /var/log/messages

If you search for logs related to Out of memory, you can basically determine that the machine memory is insufficient.


Of course, you can also check in the kernel log. Sometimes some inexplicable problems will occur in the Linux system or java or other processes running on the system, such as sudden hang up, sudden restart and so on. We can't find the problem in the software. At this time, we should suspect a hardware or kernel problem. At this time, we can use dmesg to check:

dmesg | grep java

The output is as follows:

[5673702.665338] Out of memory: Kill process 29953 (java) score 431 or sacrifice child[5673702.665338] Killed process 29953, UID 500, (java) total-vm:9805316kB, anon-rss:2344496kB, file-rss:128kB

If the above situation occurs, it can be determined that the system has killed the process.

3. The JVM itself crashes

When the JVM itself fails and the process does not exist, a hs_err_pid_xxx.log file will be generated, which contains important information that causes the crash, and the cause of the crash can be found by analyzing the file.
The directory of the file is in the working directory by default, and can also be set through the Java startup parameters:

-XX:ErrorFile=/var/log/hs_err_pid.log

If you get this file, analyze the contents of the log in detail.

4. JVM parameter adjustment

We can add the following commands to the startup parameters to visually view the real-time running status of JVM and threads through jvisualvm:

-Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=192.168.128.250

Through the real-time JVM status, it is found that the heap space of the program is not configured, and the default heap space size of the JVM is used. By adjusting the JVM parameters, it is observed that the program does not exit abnormally for a period of time.

-Xms8g -Xmx8g -Xmn2g -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m 

Of course, the values ​​of the above parameters need to be adjusted according to your actual situation.

Summarize

Through the jstat tool that comes with the JDK, observe the object creation speed, YGC times, YGC frequency, YGC time consumption of the JVM, how many objects enter the OLD area after YGC, FGC times, FGC frequency, FGC time consumption, and GC total time consumption. You can initially see the memory usage of the program, and combined with some of the above methods, you can basically find out the reason for the abnormal exit.

Attachment: JVM parameters commonly used in G1

#堆内存最大最小值为4g,新生代内存2g
-Xms4g -Xmx4g -Xmn2g 
#元空间128m,最大320m
-XX:MetaspaceSize=128m 
-XX:MaxMetaspaceSize=320m 
#开启远程debug
-Xdebug -Xrunjdwp:transport=dt_socket,address=9555,server=y,suspend=n 

#使用G1垃圾收集器,在低延迟和高吞吐间寻找平衡,可以调整最大停止时间,设置新生代大小来提高吞吐量,让出cpu资源
-XX:+UseG1GC
#设置最大暂停时间,默认200ms
-XX:MaxGCPauseMillis=200
#指定Region大小,必须是2次幂
-XX:G1HeapRegionSize=2m
#反复执行混合回收8次,每次回收受MaxGCPauseMillis的影响可能一次性回收不了所有垃圾,增加次数才能回收的更彻底
-XX:G1MixedGCCountTarget=8
# 混合回收整理出来的空闲空间占heap的10时,结果老年代的回收,默认5
-XX:G1HeapWastePercent=10
#设置新生代大小,最大60%,默认5%
-XX:G1NewSizePercent=10 -XX:G1MaxNewSizePercent=50

-XX:SurvivorRatio=8 
#在控制台输出GC情况
-verbose:gc 
#gc日志打印到执行日志文件
-Xloggc:./logs/job_execute_gc.log
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
#可以生成更详细的Survivor空间占用日志
-XX:+PrintAdaptiveSizePolicy 
#jdk 1.6开始,默认server模式下开启了这个参数,意为当jvm检测到程序在重复抛一个异常,在执行若干次后会将异常吞掉
-XX:-OmitStackTraceInFastThrow 
-XX:-UseLargePages
#指定加载配置文件
--spring.config.location=classpath:/,classpath:/config/,file:./,file:./config/,file:/home/mall-job/conf/

#---当前分布式任务调度采用jvm参数,-Xmn2g,-XX:MaxGCPauseMillis=400调整新生代内存大小,增大暂停时间提高吞吐量---------------------------
-Xms4g -Xmx4g -Xmn2g -Xdebug -Xrunjdwp:transport=dt_socket,address=9555,server=y,suspend=n -XX:+UseG1GC -XX:MaxGCPauseMillis=400 -XX:G1HeapRegionSize=2m -XX:G1MixedGCCountTarget=8 -XX:G1MixedGCCountTarget=8 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/logs/execute/mall-job-execute-gc.log

Tip: For more content, please visit Clang's Blog: https://www.clang.asia

Guess you like

Origin blog.csdn.net/u012899618/article/details/125326993