Java OOM is still looking at the log, brother, you are very wrong, the correct way is to analyze the dump file

An OOM exception occurs in the application, do you still troubleshoot the problem by looking at the log (this way of locating and solving the problem is just a coincidence with a high probability)? The correct troubleshooting solution is to analyze the dump file. Do you know why?

OOM exception --intsmaze

First of all, the OOM exceptions I encountered during development are basically located by looking at the log log (many OOM exceptions are caused by an infinite loop or the amount of data returned by the query, no paging, etc., we can indeed locate it quickly through the exception log, but this is not the correct posture.), it just happens that the exception stack information printed in the log is the corresponding code problem.
Many bloggers have also said that locating OOM exceptions is done by analyzing dump logs, so they are deeply puzzled, why do they have to analyze dump logs when they can be solved by looking at logs, and there is no satisfactory answer found on the Internet. I asked many developers around me, and they only said that dumps are used for performance analysis, and logs are used for abnormal troubleshooting. After several times of deep thinking, I suddenly became enlightened, and I hereby write down the reason.

Correct posture dump file analysis --intsmaze

Please see the big screen:

public class OOMDump {

    static class OOMIntsmaze {
        public byte[] placeholder = new byte[64 * 1024];
    }

    public static void fillHeap(ArrayList<OOMIntsmaze> list, int num) throws Exception {

        for (int i = 0; i < num; i++) {
            list.add(new OOMIntsmaze());
            System.out.println(i);
        }
    }

    public static void main(String[] args) throws Exception {
        ArrayList<OOMIntsmaze> list = new ArrayList<OOMIntsmaze>();
        fillHeap(list,131);
        Thread.sleep(20000000);
    }
}

We configure jvm parameters as follows -Xmx10M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=d://
When fillHeap(list,131), the program executes normally; when fillHeap(list,132), the program will report OOM exception:

130
java.lang.OutOfMemoryError: Java heap space
Dumping heap to d://\java_pid10000.hprof ...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at cn.intsmaze.dump.OOMDump$OOMIntsmaze.<init>(OOMDump.java:27)
	at cn.intsmaze.dump.OOMDump.fillHeap(OOMDump.java:34)
	at cn.intsmaze.dump.OOMDump.main(OOMDump.java:47)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
Heap dump file created [10195071 bytes in 0.017 secs]
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

通过异常日志我们可以看到,是因为代码
at cn.intsmaze.dump.OOMDump.fillHeap(OOMDump.java:34)
list.add(new OOMIntsmaze());
导致的问题,通过日志所见即所得,我立马解决了问题,为什么要看dump日志呢?我有病啊。

其实不然,骚年。假如main方法如下,执行

public static void main(String[] args) throws Exception {
        ArrayList<OOMIntsmaze> list = new ArrayList<OOMIntsmaze>();
        fillHeap(list,131);
        Map<String,OOMIntsmaze> map=new HashMap<String,OOMIntsmaze>();
        map.put("LIUYANG",new OOMIntsmaze());
        map.put("intsmaze",new OOMIntsmaze());
        Thread.sleep(20000000);
    }

这个时候我们通过异常日志发现是因为map.put("LIUYANG",new OOMIntsmaze());导致的,找到代码发现,map里面才插入了一条数据,没有出现死循环,怎么会导致OOM异常了,真是活久见了。

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at cn.intsmaze.dump.OOMDump$OOMIntsmaze.<init>(OOMDump.java:27)
	at cn.intsmaze.dump.OOMDump.main(OOMDump.java:49)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

设置10M我们可以发现list添加132各个元素时会发生OOM,这个时候我们向list添加131个元素,然后执行map添加,发现map添加一个元素就报错。此时的oom异常日志定位的是map添加元素导致的。
但是真实情况不是的,因为看代码也会发现map只添加了2个元素,怎么会是他造成的。map的添加只是刚好此时jvm内存达到容量上限了。
所以要找到根本问题,是需要通过dump文件分析OOM时,各个对象的容量状态。
而且实际情况中,map.put()的代码并不会向上面示例一样和list.add()代码放在一块,而是位于不同的包下,不同的业务流程中。这个时候看log日志去定位基本不可能了。
但是为什么大家出行OOM异常还是通过看log日志而且定位的位置是正确的。只是因为向list.add这种循环中,一直在执行,基本大概率是他触发的。

正确的姿势--intsmaze

所以为了防患于未然,程序猿在开发的时候,一定要配置jvm启动参数HeapDumpOnOutOfMemoryError。
参数-XX:+HeapDumpOnOutOfMemoryError可以让虚拟机在出现内存溢出异常时Dump出当前的内存堆转储快照以便事后进行分析

dump丢失打印--intsmaze

有些时候,我们的应用程序宕机,既不会打印log日常信息,dump文件也不会生成,这个时候基本就是linux系统杀掉了我们的应用程序进程。

查看/var/log/messages文件

messages 日志是核心系统日志文件。它包含了系统启动时的引导消息,以及系统运行时的其他状态消息。在messages里会出现以下信息

out of memory:kill process 8398(java) score 699 or sacrifice child
killed process 8398,UID 505,(java) total-vm:2572232kB,anno-rss:1431292kB,file-rss:908kB

oom killer是linux系统的一个保护进程,当linux系统所剩的内存空间不足以满足系统正常运行时,会触发。oomkiller执行时,会找出系统所有线程的score值最高的那个pid,然后干掉。
这里我们可以看到,JAVA进程的确是被LINUX的oom killer干掉了。

我们的应用程序和日志都只能记录JVM内发生的内存溢出。如果JVM设置的堆大小超出了操作系统允许的内存大小,那么操作系统会直接杀死进程,这种情况JVM就无法记录本次操作。
Linux对于每个进程有一个OOM评分,这个评分在/proc/pid/oom_score文件中。例如/proc/8398/oom_score,如果不希望杀死这个进程,就将oom_adj内容改为-17。
更多关于linux的oom killer机制请自行百度检索。

最正确的姿势:首先调整JVM的heap大小,使得JVM的OOM优先于操作系统的OOM出现,接着设置运行参数,在发生OOM的时候输出heapdump文件。

哪些内存溢出会产生dump文件--intsmaze

请上公交车:JVM各种内存溢出是否产生dump https://blog.csdn.net/stevendbaguo/article/details/51366181

Guess you like

Origin blog.csdn.net/hbly979222969/article/details/110676490