Troubleshooting: spring kafka memory leak

Phenomenon : After a service goes online, it is found that after running for a few days, there is no response and no log output. There are no exceptions in the previous log.
Guess  : The process is also there, the logs such as nginx are normal, and the database is normal. It is suspected that the service has a memory leak.
Troubleshooting ideas
1. The production environment cannot affect normal services. After a problem occurs, the operation and maintenance is restarted as soon as possible, and the scene is gone. Then use the written shell to restart regularly (multiple units provide services without interruption).
2. Next, use jmap to see the number of objects in memory. It is a problem if the overall number of objects only increases and does not decrease. At the same time, a separate test environment was also made to run the service and prepare to reproduce the problem.
3. Take the reproduced test environment as an example, first find the process number (assuming it is 1302)

(1) Use jmap -heap 1302 to look at the heap first.
Attaching to process ID 1302, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.131-b11

using thread-local object allocation.
Parallel GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 4164943872 (3972.0MB)
   NewSize                  = 87031808 (83.0MB)
   MaxNewSize               = 1388314624 (1324.0MB)
   OldSize                  = 175112192 (167.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 9437184 (9.0MB)
   used     = 9100328 (8.678749084472656MB)
   free     = 336856 (0.32125091552734375MB)
   96.43054538302951% used
From Space:
   capacity = 2621440 (2.5MB)
   used     = 0 (0.0MB)
   free     = 2621440 (2.5MB)
   0.0% used
To Space:
   capacity = 3145728 (3.0MB)
   used     = 0 (0.0MB)
   free     = 3145728 (3.0MB)
   0.0% used
PS Old Generation
   capacity = 2776629248 (2648.0MB)
   used     = 2776562952 (2647.9367752075195MB)
   free     = 66296 (0.06322479248046875MB)
   99.99761235677944% used

26258 interned Strings occupying 3126264 bytes.

It can be seen that the space of PS Old Generation has been occupied, and the unresponsiveness of the program is the direct consequence of this situation.

(2) Then print the object occupancy of the memory, if the content is large, output the content to a text.
jmap -histo -F ${pid} > /xxx/logs/jmap.log 

The content is as follows:

Attaching to process ID 1302, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.131-b11
Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              7393486 177443664       java.lang.String
2:              3653036 146121440       org.apache.kafka.clients.consumer.ConsumerRecord
3:              3653040 87672960        java.util.concurrent.LinkedBlockingQueue$Node
 

After taking a look, the number of ConsumerRecords is large, which is inconsistent with the business logic (Note: The online environment can continue to pay attention to the number of objects. If it continues to increase but not recycle, it should be noted), and it is not recycled and reduced.

(3) Go to the project program to see the place where this object is used. After investigation, there is no reference problem in the project program. The place where this object is used is the consumer Listener of kafka, and the object is provided by spring-kafka. So it is suspected that it is a bug of the third-party spring kafka.

(4) After investigation, I found these two issues.

https://github.com/spring-projects/spring-kafka/pull/162

https://github.com/spring-projects/spring-kafka/issues/161

Check the content of the issue and confirm the version number. The dependency jar we used was before the issue was resolved. Then this has an impact on us, which is logical deduction. This problem occurs because our queue size is not large or small, and it has not been recycled all the time.

(5) Update the version, re-launch or test the environment, and observe the heap usage.

refer to:

Use of JDK built-in tools:   http://blog.csdn.net/fenglibing/article/details/6411924

JVM memory leak detection tool - jmap:  http://www.wujianjun.org/2016/09/21/jvm-tools-jmap/

 

FIXING SUN.JVM.HOTSPOT.DEBUGGER.DEBUGGEREXCEPTION: CAN’T ATTACH TO THE PROCESS: https://zenidas.wordpress.com/recipes/fixing-sun-jvm-hotspot-debugger-debuggerexception-cant-attach-to-the-process/

Use off-heap memory:   http://www.raychase.net/1526

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325026222&siteId=291194637