JVM fatal error log (hs_err_pid.log) analysis (transfer)

Reprinted from: https://my.oschina.net/xionghui/blog/498785?p=1

 

When a fatal error occurs in the jvm, an error file hs_err_pid<pid>.log will be generated, which includes important information that causes the jvm to crash. By analyzing the file, you can locate the root cause of the crash and improve it to ensure system stability. When a crash occurs, the file will be generated to the working directory by default, but the generation path can be specified by the jvm parameter (introduced in JDK6):

-XX:ErrorFile=./hs_err_pid<pid>.log

The file contains the following key categories of information:

  • log header file

  • The thread information that caused the crash

  • All thread information

  • Security point and lock information

  • heap information

  • local code cache

  • compile event

  • gc related records

  • jvm memory map

  • jvm startup parameters

  • server information

Let's use a crash demo file to interpret this information step by step, so that you can easily analyze it when you encounter a crash in the future.

log header file

The log header file contains summary information that briefly describes the cause of the crash. There are many reasons for crash. Common reasons include jvm's own bug, application error, improper jvm parameter configuration, insufficient server resources, jni call error, etc.

Now refer to the following description:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb8b18fdc6c, pid=191899, tid=140417770411776
#
# JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build 1.7.0_55-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.http.impl.cookie.BestMatchSpec.formatCookies(Ljava/util/List;)Ljava/util/List;
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

An important piece of information here is that "SIGSEGV(0xb)" means that the jni code is being executed when the jvm crashes, not the java or jvm code. If the jni code is not manually called in the application, it is likely to be caused by JIT dynamic compilation. of this error. Where SIGSEGV is the signal name, 0xb is the signal code, pc=0x00007fb8b18fdc6c refers to the value of the program counter, pid=191899 is the process ID, and tid=140417770411776 is the thread ID.

PS: In addition to "SIGSEGV(0xb)", the common description is "EXCEPTION_ACCESS_VIOLATION", which indicates that the jvm's own code is being executed when the jvm crashes, which is often caused by the crash caused by the bug of the jvm; another common description It is "EXCEPTION_STACK_OVERFLOW", the description indicates that this is an error caused by a stack overflow, which is often caused by deep recursion in the application.

Another important piece of information is:

# Problematic frame:

# J org.apache.http.impl.cookie.BestMatchSpec.formatCookies(Ljava/util/List;)Ljava/util/List;

This indicates the code being executed by the jvm when the crash occurs, where " J" indicates that the java code is being executed, and the latter indicates the method stack being executed. In addition to "J", there may also be "C", "j", "V", "v", which represent:

  • C: Native C frame

  • j: Interpreted Java frame

  • In: VMframe

  • v: VMgenerated stub frame

  • J: Other frame types, including compiled Java frames

Coupled with the previous analysis of SIGSEGV(0xb)", it can now be concluded that the error is caused by JIT dynamic compilation.

Check out the data to find:

This exception is caused by jdk JIT compiler optimization, bug id 8021898, the official website description is as follows:

The JIT compiler optimization leads to a SIGSEGV or an NullPointerException at a place it must not happen.

This bug exists in the jdk1.7.0_25 to 1.7.0_55 versions, which will be fixed after 1.7.0_60. This exception can be resolved by upgrading jdk, please refer to  http://bugs.java.com/view_bug.do?bug_id=8021898 .

At this point, the problem has been analyzed, but we can go a step further and analyze other information.

The thread information that caused the crash

Below the file is the thread information that caused the crash and the thread stack information. The description information is as follows:

Current thread (0x00007fb7b4014800):  JavaThread "catalina-exec-251" daemon [_thread_in_Java, id=205044, stack(0x00007fb58f435000,0x00007fb58f536000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000003f96dc9c6c

The above indicates that the thread causing the error is 0x00007fb7b4014800 (pointer), the thread type is JavaThread, JavaThread indicates that the execution is a java thread, and other types of the thread may also be:

  • VMThread: the internal thread of the jvm

  • CompilerThread: used to call JITing, compile and unload classes in real time. Usually, the jvm will start multiple threads to handle this part of the work, and the numbers after the thread name will also accumulate, for example: CompilerThread1

  • GCTaskThread: the thread that executes gc

  • WatcherThread: The thread of jvm periodic task scheduling is a singleton object. This thread is frequently used in the JVM, such as: regular memory monitoring, JVM health monitoring, and we often need to execute some commands such as jstat to check the gc status

  • ConcurrentMarkSweepThread: When jvm performs CMS GC, it will create a thread to perform GC. When the thread is created, it will create a SurrogateLockerThread (SLT for short) thread and start it. After SLT is started, it is in the waiting stage. When CMST starts GC, it will send a message to SLT to let it acquire the global lock of the Java layer Reference object: Lock

The following "catalina-exec-251" indicates the thread name, the thread with the catalina prefix is ​​generally the thread started by tomcat, "daemon" indicates that the thread is a daemon thread, and the following "[_thread_in_Java" indicates that the thread is executing interpretation or compilation After the Java code, other types about the description may also be:

  • _thread_in_native: the current state of the thread

  • _thread_uninitialized: the thread has not been created, it only appears when the memory crashes

  • _thread_new: the thread has been created, but not started yet

  • _thread_in_native: The thread is executing native code. In general, this situation is likely to be a problem with native code.

  • _thread_in_vm: The thread is executing virtual machine code

  • _thread_in_Java: The thread is executing interpreted or compiled Java code

  • _thread_blocked: the thread is blocked

  • ..._trans: ends with _trans, the thread is in an intermediate state to switch to another state

The last "id=205044" represents the thread ID, and stack (0x00007fb58f435000, 0x00007fb58f536000) represents the stack interval.

"siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000003f96dc9c6c" This part is the unexpected signal information that causes the virtual machine to terminate: si_errno and si_code are used to identify exceptions under Linux, Windows Below is an ExceptionCode.

All thread information

The following is the thread information:

Java Threads: ( => current thread )
  0x00007fb798015800 JavaThread "catalina-exec-280" daemon [_thread_blocked, id=206093, stack(0x00007fb58d718000,0x00007fb58d819000)]
  0x00007fb7a4016800 JavaThread "catalina-exec-279" daemon [_thread_blocked, id=206091, stack(0x00007fb58d819000,0x00007fb58d91a000)]
  ... ...(省略)
  
  Other Threads:
  0x00007fb8b4231000 VMThread [stack: 0x00007fb854eb6000,0x00007fb854fb7000] [id=192015]
  0x00007fb8b4321000 WatcherThread [stack: 0x00007fb835e6c000,0x00007fb835f6d000] [id=192414]

The information is similar to that described above, where [_thread_blocked means the thread is blocked.

Security point and lock information

Next is the security point and lock information:

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

The safety line information is normal operation, and other possible descriptions include:

  • not at a safepoint: normal operating state

  • at safepoint: all threads are blocked due to the virtual machine wait state, waiting for a virtual machine operation to complete

  • synchronizing: a special virtual machine operation that requires other threads within the virtual machine to remain in a waiting state

Lock information is not held by threads, Mutex is a lock inside the virtual machine, and Monitor is a synchronized lock or other associated Java object.

heap information

Then the following is the heap information:

Heap
 par new generation   total 2293760K, used 1537284K [0x00000006f0000000, 0x00000007900000000x0000000790000000)
  eden space 1966080K,  78% used [0x00000006f0000000, 0x000000074dc97aa8, 0x0000000768000000)
  from space 327680K,   0% used [0x00000007680000000x00000007680a95800x000000077c000000)
  to   space 327680K,   0% used [0x000000077c0000000x000000077c0000000x0000000790000000)
 concurrent mark-sweep generation total 1572864K, used 49449K [0x00000007900000000x00000007f0000000, 0x00000007f0000000)
 concurrent-mark-sweep perm gen total 262144K, used 49857K [0x00000007f0000000, 0x00000008000000000x0000000800000000)
 
 Card table byte_map: [0x00007fb8b8fa8000,0x00007fb8b9829000] byte_map_base: 0x00007fb8b5828000

Heap information includes: young generation, old generation, permanent generation information. The use of the CMS garbage collector is identified here.

The following "Card table" represents a card table, which is a data structure maintained by the JVM to record references when objects are changed, so that fewer tables and roots are traversed during gc.

local code cache

Then the following is the local code cache information:

Code Cache  [0x00007fb8b1000000, 0x00007fb8b1a60000, 0x00007fb8b4000000)
 total_blobs=3580 nmethods=3111 adapters=421 free_code_cache=38857Kb largest_free_block=39469312

This is a piece of memory used to compile and save native code; note that it is native code, which is different from PermGen (permanent generation), which is used to store metadata of jvm and java classes.

compile event

Then the following is the local code compilation information:

Compilation events (10 events):
Event: 110587.798 Thread 0x00007fb8b425a800 3338             java.util.HashSet::remove (20 bytes)
Event: 110587.804 Thread 0x00007fb8b425a800 nmethod 3338 0x00007fb8b168a9d0 code [0x00007fb8b168ab60, 0x00007fb8b168afa8]
... ...(省略)
Event: 112147.387 Thread 0x00007fb8b425a800 3342             org.apache.http.impl.cookie.BestMatchSpec::formatCookies (116 bytes)
Event: 112147.465 Thread 0x00007fb8b425a800 nmethod 3342 0x00007fb8b18fcd50 code [0x00007fb8b18fd1a0, 0x00007fb8b18ff338]

As you can see, a total of 10 compilations were made; including the compilation of org.apache.http.impl.cookie.BestMatchSpec::formatCookies; this is consistent with the previous conclusion.

gc related records

Then the following is the gc execution record:

GC Heap History (10 events):
Event: 110665.975 GC heap before
{Heap before GC invocations=255 (full 31):
 par new generation   total 2293760K, used 1966777K [0x00000006f0000000, 0x00000007900000000x0000000790000000)
  eden space 1966080K, 100% used [0x00000006f0000000, 0x00000007680000000x0000000768000000)
  from space 327680K,   0% used [0x00000007680000000x00000007680ae4800x000000077c000000)
  to   space 327680K,   0% used [0x000000077c0000000x000000077c0000000x0000000790000000)
 concurrent mark-sweep generation total 1572864K, used 49237K [0x00000007900000000x00000007f0000000, 0x00000007f0000000)
 concurrent-mark-sweep perm gen total 262144K, used 49856K [0x00000007f0000000, 0x00000008000000000x0000000800000000)
Event: 110665.981 GC heap after
Heap after GC invocations=256 (full 31):
 par new generation   total 2293760K, used 693K [0x00000006f0000000, 0x00000007900000000x0000000790000000)
  eden space 1966080K,   0% used [0x00000006f0000000, 0x00000006f0000000, 0x0000000768000000)
  from space 327680K,   0% used [0x000000077c0000000x000000077c0ad6f8, 0x0000000790000000)
  to   space 327680K,   0% used [0x00000007680000000x00000007680000000x000000077c000000)
 concurrent mark-sweep generation total 1572864K, used 49237K [0x00000007900000000x00000007f0000000, 0x00000007f0000000)
 concurrent-mark-sweep perm gen total 262144K, used 49856K [0x00000007f0000000, 0x00000008000000000x0000000800000000)
}
... ...(省略)

It can be seen that the number of gc is 10 times (full gc), and then the memory information before and after each gc is described later; there is no problem such as insufficient memory.

jvm memory map

The following is the library information loaded by the jvm:

Dynamic libraries:
00400000-00401000 r-xp 00000000 08:02 39454583                           /home/service/jdk1.7.0_55/bin/java
00600000-00601000 rw-p 00000000 08:02 39454583                           /home/service/jdk1.7.0_55/bin/java
013cd000-013ee000 rw-p 00000000 00:00 0                                  [heap]
6f0000000-800000000 rw-p 00000000 00:00 0 
3056400000-3056416000 r-xp 00000000 08:02 57409539                       /lib64/libgcc_s-4.4.7-20120601.so.1
3056416000-3056615000 ---p 00016000 08:02 57409539                       /lib64/libgcc_s-4.4.7-20120601.so.1
3056615000-3056616000 rw-p 00015000 08:02 57409539                       /lib64/libgcc_s-4.4.7-20120601.so.1
353be00000-353be20000 r-xp 00000000 08:02 57409933                       /lib64/ld-2.12.so
353c01f000-353c020000 r--p 0001f000 08:02 57409933                       /lib64/ld-2.12.so
353c020000-353c021000 rw-p 00020000 08:02 57409933                       /lib64/ld-2.12.so
... ...(省略)

This information is the virtual memory list area when the virtual machine crashes. It can tell you which libraries were in use at the time of the crash , where they were located, as well as stack and guard page information. Take the first item in the list as an example:

  • 00400000-00401000: memory area

  • r-xp: Permission, r/w/x/p/s means read/write/execute/private/shared respectively

  • 00000000: the offset within the file

  • 08:02: MajorID and minorID of file location

  • 39454583: inode number

  • /home/service/jdk1.7.0_55/bin/java: file location

     

jvm startup parameters

The following is the jvm startup parameter information:

VM Arguments:
jvm_args: -Djava.util.logging.config.file=/home/service/tomcat7007-account-web/conf/logging.properties -Xmx4096m -Xms4096m -Xmn2560m -XX:SurvivorRatio=6 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:/home/work/webdata/logs/tomcat7007-account-web/develop/gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/work/webdata/logs/tomcat7007-account-web/develop/ -Dtomcatlogdir=/home/work/webdata/logs/tomcat7007-account-web/develop -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7407 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.endorsed.dirs=/home/service/tomcat7007-account-web/endorsed -Dcatalina.base=/home/service/tomcat7007-account-web -Dcatalina.home=/home/service/tomcat7007-account-web -Djava.io.tmpdir=/home/service/tomcat7007-account-web/temp 
java_command: org.apache.catalina.startup.Bootstrap start
Launcher Type: SUN_STANDARD

Environment Variables:
JAVA_HOME=/home/service/jdk1.7.0_55
PATH=/opt/zabbix/bin:/opt/zabbix/sbin:/home/service/jdk1.7.0_55/bin:/home/work/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/work/bin
SHELL=/bin/bash

The above is the jvm parameter, and the following is the environment configuration of the system.

server information

Next is the server information:

/proc/meminfo:
MemTotal:       65916492 kB
MemFree:        14593468 kB
Buffers:          222452 kB
Cached:         28502452 kB
SwapTotal:             0 kB
SwapFree:              0 kB
... ...(省略)
/proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
stepping	: 4
... ...(省略)

 

The above is the memory information, mainly focus on the swap information to see if virtual memory is used; the following is the cpu information.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326880740&siteId=291194637