Memory leak analysis of jvm tuning

  I. Introduction

      We all know that serious memory leaks will cause memory overflow, and memory overflow will eventually lead to program crashes. A few days ago, I was almost caused by this problem with endocrine disorders. Every night I had to worry about it, for fear that a WeChat or phone call would come and say that the project was down. Hey, if you talk too much, it's tears. Let's go directly to the topic.

2. Problem description

1. Deployment environment

      linux;

2. Problem discovery

        When the project was first launched, there was a problem of slow response, even 502 directly. The first thing that comes to my mind is the problem of the database, and then I installed the mysql client on the machine, connected to the online database (connection command: mysql -h(ip) -u (user name) -p (password) ), just choose one Zhang table, query efficiency is no problem. There is nothing wrong now, let's check the log. It was found that they were all broken pipe exceptions (the client sent a request and the server did not respond for a long time), and a closer look, this exception almost appeared after java.lang.OutOfMemoryError was generated. Quickly turn on the top command and see that the situation is also expected, and the cpu has soared to more than 90%.

     top:

       f or F Add or delete items from the current display.

  o or O to change the order of the displayed items.

  l Switch display average load and start time information.

  m Switch to display memory information.

  t Switch to display process and CPU status information.

  c Switch to display the command name and the complete command line.

  M is sorted according to the size of the resident memory.

  P is sorted according to the percentage of CPU usage.

  T is sorted by time/cumulative time.

  3. Memory leak analysis process

      According to the top command to find out which threads are occupying the cpu, the utilization rate of the cpu has remained high.

   (1) Find the process ID PID of java through jps, ps or other methods;

   (2) top -p <pid> press H (all threads of the current process) to find out the thread number that cpu accounts for 100%, because the thread number found is binary and needs to be converted to hexadecimal;

Eight high-cpu threads are found to be doing gc garbage collection:

   (3) Use jstack -l <pid> |grep <converted hexadecimal number> to see what the thread is doing. The thread has three states in the dump: RUNNABLE (executing), BLOCKED (blocked), and WAITING (waiting). If you find that the result is similar to this: "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000053293800 nid=0x*** runnable, it means that the jvm is marking and clearing the thread, so you can know that the jvm is busy garbage collection. Because jvm divides the memory into several areas according to the life cycle of the object. It is divided into young generation and old generation. Most of the objects in the young generation are small objects. The jvm will be recycled frequently. The surviving objects will be transferred from the from area to the to area according to the replication algorithm . It only needs to pay a small amount of the copy cost of the surviving objects. The collection can be completed. The survival rate of objects in the old generation is relatively high, and there is no extra space to allocate pressure to it, so the " mark-and-sort " algorithm is used for recycling.

(4) Use jmap -heap <pid> to check the heap memory situation and find that the memory in the old area is stretched to 99%. It is not difficult to understand why the program responds slowly or hangs directly, and the cpu is busy with garbage collection. There is no time Take care of other things;

(5) The memory overflow may also be caused by insufficient memory allocated by the machine, so I allocated 8g of memory (xmx8g) to it, and found that not long after the program started, the cpu ran to more than 90%. I realized that this was not the problem at all, and realized that the garbage was not collected, and even more memory would be consumed, so I started to optimize the code.

(6) Code optimization? Where do you start? You must find out which objects have not been reclaimed and which objects take up the most space. So, I dumped a heap file from the server and analyzed it ( command: jmap -dump:live,format=b,file=head.xx <pid> ). This file may be a bit big. If you take it down, it’s best to compress it ( Command: tar -czf dump.tar.gz dump.xx ). If you don't take it down, you can also use the jhat command to analyze it in detail. (Note: If you use this, you need to see if the machine has enough memory, otherwise the machine may hang)

(7) After the file is taken down, a special memory analysis tool is required. Among them, there are two tools that can be used, one is the tool that comes with jdk1.6 and above: jvisualvm, but this tool is quite difficult to analyze 5G heap files, the waiting time is relatively long, and the results are not intuitive. Therefore, it is recommended to use the second type: eclipse's plugin Memory Analyzer (download: http://www.eclipse.org/downloads/download.php?file=/mat/1.8.1/rcp/MemoryAnalyzer-1.8.1.20180910-win32. win32.x86_64.zip ). If the opened heap file is very large, you need to modify the startup parameters of MAT (xmx parameter in memoryanalyzer.ini);

(8) After opening, it was found that a large part of the objects in the user table were not recycled, so I thought that many fields in the user table were useless but were not removed. Are they used in other places, resulting in the objects not being recycled, so Remove all useless fields and update them. After a while, he died again, and his heart was on the verge of collapse. . .

(9) Use jmap -histo <pid> to check the memory usage of the push. Use this command to check and find that the objects in the user table still occupy a lot of memory. The problem has not been solved yet...

(10) So there is no way, write a test by yourself. Are you not recycling? I think you don't recycle it. You overwrite the finalize() method of garbage collection in the user class, call the save method of JPA, and find that this class is not recycled. Careful study found that this class uses hibernate caching technology, is it caused by caching ( caching is a data structure used to quickly find the results of operations that have been performed, so if an operation requires more resources and Will be used many times. The usual practice is to cache the operation results of commonly used input data so that the cached data will be used the next time the operation is called. Cache is usually implemented in a dynamic manner. If the cache setting is incorrect, it will be used a lot If the cache is used, there will be consequences of memory overflow, so it is necessary to balance the memory capacity used with the speed of retrieving data ). So I removed the cache, tested it again, and found that the object was finally recovered.

3. Summary

      Later, I thought about it carefully. There were more than 100,000 users and more people were online. Due to caching, user objects have not been recycled, and they have been piled up, which led to memory overflow.

     I thought that the problem would be resolved, but was told the next day that the project was suspended again, and my heart was suspended again. I quickly turned on the machine and looked at the top information, and found that the cpu was normal, and then looked at the bandwidth. It turned out that the bandwidth was insufficient...

     

 

Guess you like

Origin blog.csdn.net/weixin_39886135/article/details/83791601