A container of springboot program OOM problem Adventure

background

Operation and maintenance staff feedback container of a java program to run every period of time there will be OOM problem, restart, about two days after the interval reproduction.

Questionnaire

A check log

Because it is container deployment procedure using docker logs boarded the host ContainerId view the output log, and found no abnormal output. Use docker stats view the resources used by the container, the 2G allocation size, also found no abnormalities.

Two missing tools

Intends to take a closer look into the interior of the container, the first use docker ps find java program ContainerID
, then perform docker exec -it ContainerId / bin / bash into the container. After entering, the thinking used jmap, jstack order to diagnose other JVM analysis found the command does not exist, it is shown below:

bash: jstack: command not found
bash: jmap: command not found
bash: jps: command not found
bash: jstat: command not found

Suddenly realized, could be used to play when the mirror is a streamlined version of the JDK, and without these jVM analysis tools, but it still can not stop the pace of our analysis of the problem, this time docker cp command comes in handy, it is the role : between the container and the host file copy. As used herein, the idea is: a new copy into the container jdk, JVM purpose of carrying out analyzes the command, with reference to the following usage:

Usage:  docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
        docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH [flags]

With JVM tools, we can begin to analyze slightly.

Three investigations GC case

View gc situation by jstat

 bin/jstat -gcutil 1 1s

file

It appears there is no problem, full gc less. Look again occupancy object, because it is inside the container, process 1, execute the following command:

bin/jmap -histo 1 |more 

Found ByteBuffer objects occupy the highest, this is a outliers.
file

Four examinations threads snapshot case
  • View a snapshot of the situation jstack by a thread.
 bin/jstack -l 1 > thread.txt

Download snapshot, here recommend an online thread snapshot analysis website.

https://gceasy.io

file

Once uploaded, I found a thread created nearly 2,000, and most of them are TIMED_WAITING state. Feeling gradually approaching the truth. Click Details found a large number of kafka-producer-network-thread | producer-X thread. If it is low version is a lot of ProducerSendThread thread. (That subsequent verification), it can be seen that a thread kafka producer created, as a producer sends Model:

file

The transmission model producers, we know that the sender thread is mainly do two things, first get kafka cluster Metadata shared with multiple producers, and second, the data is sent to the local producers in the message queue, send to far end clusters. The local message queue data structure is the bottom of ByteBuffer java NIO.

Here outliers found two: create too many kafka producers.

Since there is no business code, decided to write a Demo program to test this idea, the timing of two seconds to create a producer object sends the current time to kafka in order to better observe, designated jmx port at startup, using jconsole to observe the threads and memory , the code is as follows:

nohup java -jar -Djava.rmi.server.hostname=ip 
 -Dcom.sun.management.jmxremote.port=18099
 -Dcom.sun.management.jmxremote.rmi.port=18099
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false -jar
 com.hyq.kafkaMultipleProducer-1.0.0.jar   2>&1 &

After connecting jconsole observed, has been found to increase the number of threads, memory usage is also increasing, as follows chart:

file

Recalling the cause of the malfunction

Analysis here, basically decided, should be the business cycle of the code to create an object caused Producer.
Send encapsulated in the model kafka producers in the Java NIO ByteBuffer used to save the message data, create ByteBuffer is very resource consuming, although BufferPool designed to reuse, but also stand up to every message you create a buffer objects this is the reason why the jmap display ByteBuffer occupied the most memory.

to sum up

In the daily fault location, use JDK comes with a lot of tools to help us assist locate the problem. Some other points of knowledge:
Object Meaning jmap -histo displayed:

[C 代表  char[]
[S 代表 short[]
[I 代表 int[]
[B 代表 byte[]
[[I 代表 int[][]

If the export dump file is too large, MAT can be uploaded to the server, after the analysis is complete, download analysis view, the command is:

./mat/ParseHeapDump.sh active.dump  org.eclipse.mat.api:suspects
org.eclipse.mat.api:overview org.eclipse.mat.api:top_components

Full GC may be triggered in several ways as soon as possible

1) System.gc();或者Runtime.getRuntime().gc();

2 ) jmap -histo:live或者jmap -dump:live。
这个命令执行,JVM会先触发gc,然后再统计信息。
3) 老生代内存不足的时候
 

Guess you like

Origin www.cnblogs.com/hyq0823/p/11564168.html