Online problem location ------ memory bottleneck

The memory problem is more troublesome than the CPU, and there are more scenarios. Mainly include OOM, GC issues and off-heap memory. Generally speaking, we will first use the free command to check the various conditions of the memory of a shot.

free

freeIs to view the memory usage, including physical memory, swap memory (swap) and kernel buffer memory.

free -h -s 3Indicates that the memory status is output every three seconds, the command is as follows

[1014154@cc69dd4c5-4tdb5 ~]$ free
              total        used        free      shared  buff/cache   available
Mem:      119623656    43052220    45611364     4313760    30960072    70574408
Swap:             0           0           0
[1014154@cc69dd4c5-4tdb5 ~]$ free -h -s 3
              total        used        free      shared  buff/cache   available
Mem:           114G         41G         43G        4.1G         29G         67G
Swap:            0B          0B          0B

              total        used        free      shared  buff/cache   available
Mem:           114G         41G         43G        4.1G         29G         67G
Swap:            0B          0B          0B
  • Mem: Is the memory usage.

  • Swap: Is the usage of swap space.

  • total: The total available physical memory and swap space of the system.

  • used: Physical memory and swap space that has been used.

  • free: How much physical memory and swap space are available for use is the amount of physical memory that has not yet been used  .

  • shared: The size of the physical memory being shared.

  • buff/cache: The size of physical memory used by buffer and cache.

  • available: The size of the physical memory that can also be used by the application. It is the amount of available memory from the perspective of the application, available ≈ free + buffer + cache  .

Swap space

Swap space is an area on the disk. When the physical memory of the system is tight, Linux will save the rarely accessed data in the memory to swap, so that the system has more physical memory to serve each process, and when the system needs to access When the content stored on swap is loaded, the data on the swap is then loaded into the memory, which is often referred to as swapping out and swapping in. Swap space can alleviate the lack of memory to a certain extent, but it needs to read and write disk data, so the performance is not very high.

vmstat (recommended)

Vmstat (VirtualMeomoryStatistics, virtual memory statistics) is a common tool for monitoring memory in Linux. It can monitor the overall situation of the virtual memory, process, CPU, etc. of the operating system. It is recommended.

vmstat 5 3Means that the statistics are counted every 5 seconds, a total of three counts.

[1014154@cc69dd4c5-4tdb5 ~]$ vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 8  0      0 45453212 374768 30763728    0    0    14    99    1    1 11 10 78  0  1
10  0      0 45489232 374768 30763360    0    0     2  1275 95118 97908 13 11 75  0  1
 6  0      0 45452908 374768 30765148    0    0     0  3996 89924 92073 12 10 78  0  1

procs

r: Represents the number of processes running and waiting for the CPU time slice (that is, how many processes are really allocated to the CPU). If this value is longer than the number of system CPUs for a long time, it means that the CPU is insufficient and the CPU needs to be increased  . b: Indicates the number of processes waiting for resources, such as waiting for I/O or memory swap.

memory

swpd: Indicates the memory size of the switch to the memory swap area, that is, the used size of the virtual memory (in KB). If it is greater than 0, it means that your machine's physical memory is insufficient. If it is not the cause of the program memory leak, then you should upgrade the memory Or migrate memory-consuming tasks to other machines  . free: Indicates the physical memory that is currently free. buff: Indicates the buffer size. Generally, buffering is required for reading and writing to block devices  Cache: Indicates the size of the cache, which is generally used as a file system for buffering. Frequently accessed files will be cached. If the cache value is very large, it means that there are more cached files. The bi in is relatively small, indicating that the efficiency of the file system is better.

swap

si: Indicates that the data is read from the disk into the memory; in layman's terms, it is the size of the virtual memory read from the disk per second. If this value is greater than 0, it means that the physical memory is not enough or the memory is leaked. It is necessary to find the memory-consuming process to solve it  . so: Represents the size of the data written from the memory to the disk, that is, the data that enters the memory from the memory swap area.

Note: Under normal circumstances, the values ​​of si and so are both 0. If the values ​​of si and so are not 0 for a long time, it means that the system memory is insufficient and the system memory needs to be increased.

I

bi: Indicates the total amount of data read by the block device, that is, read disk, in kb/s  bo: Indicates the total amount of data written to the block device, that is, write disk, in kb/s

Note: If the value of bi+bo is too large and the value of wa is too large, it indicates the system disk IO bottleneck.

system

in: Indicates the number of device terminals observed per second in a certain time interval. cs: Represents the number of context switches per second. This value should be as small as possible. If it is too large, consider reducing the number of threads or processes  . For example, in web servers such as apache and nginx, we generally perform a performance test of thousands or even tens of thousands of concurrent tests. The process of selecting a web server can be adjusted by the peak value of the process or thread, and the pressure test is until the cs reaches A relatively small value, the number of processes and threads is a more appropriate value. The same is true for system calls. Every time a system function is called, our code will enter the kernel space, leading to context switching. This is very resource intensive, and we must try to avoid calling system functions frequently. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious things, and the CPU is not fully utilized, which is undesirable.

Note: The larger these two values, the more CPU consumed by the kernel.

CPU

us: Indicates the percentage of CPU time consumed by the user process . The higher the us value, the more CPU time the user process consumes. If it is greater than 50% for a long time, you need to consider the optimization program or algorithm  . sy: Indicates the percentage of CPU time consumed by the system kernel process. Generally speaking, us+sy should be less than 80%. If it is greater than 80%, it indicates that there may be a CPU bottleneck  . id: Represents the percentage of time that the CPU is in the space state. wa: Represents the percentage of CPU time occupied by IP waiting. The higher the value of wa, the more serious the I/O waiting. According to experience, the reference value of wa is 20%. If it exceeds 20%, it means that the I/O waiting is serious and causing I/O. The reason for waiting may be caused by a large number of random reads and writes on the disk, or by the loan bottleneck (mainly block operation) of the disk or the monitor  .

sar

sar and free are similar to sar -r 3output memory information every three seconds:

[root@localhost ~]# sar -r 3
Linux 3.10.0-1062.el7.x86_64 (localhost.localdomain)    2020年04月28日  _x86_64_        (2 CPU)

15时40分10秒 kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
15时40分13秒    106800   1314960     92.49      2144    573248   4110864    116.82    563664    498888        36
15时40分16秒    106816   1314944     92.49      2144    573248   4110864    116.82    563668    498888        36
15时40分19秒    106816   1314944     92.49      2144    573248   4110864    116.82    563668    4988

In-heap memory

Most of the memory problems are still heap memory problems. On the surface, it is mainly divided into OOM and StackOverflow.

1 、 UNCLE

Insufficient memory in JMV, OOM can be roughly divided into the following types:

  • Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread

This means that there is not enough memory space to allocate the java stack to the thread. Basically, there is still a problem with the thread pool code, such as forgetting to shut down, so we should first look for the problem from the code level, using jstack or jmap. If everything is normal, the JVM can reduce the size of a single thread stack by specifying Xss. In addition, at the system level, you can increase the thread limit of OS by modifying /etc/security/limits.confnofile and nproc;

image

  • Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

This means that the memory usage of the heap has reached the maximum value set by -Xmx, which should be the most common OOM error. The solution is still to find it in the code first. If there is a memory leak, use jstack and jmap to locate the problem. If everything is normal, you need to adjust the value of Xmx to expand the memory;

  • Caused by: java.lang.OutOfMemoryError: Meta space

This means that the memory usage of the metadata area has reached the maximum value set by XX:MaxMetaspaceSize. The troubleshooting idea is the same as the above. The parameters can be adjusted through XX:MaxPermSize (not to mention the permanent generation before 1.8);

2、Stack Overflow

The stack memory overflows, and everyone sees more of this.

Exception in thread "main" java.lang.StackOverflowError

Indicates that the memory required by the thread stack is greater than the Xss value, and it is also checked first. The parameters are adjusted through Xss, but if the adjustment is too large, it may cause OOM.

3. Use JMAP to locate code memory leaks

For the above code investigation on OOM and StackOverflow, we generally use JMAPjmap -dump:format=b,file=filename pid to export the dump file

image

Use mat (Eclipse Memory Analysis Tools) to import dump files for analysis. Generally, we can directly select Leak Suspects for memory leaks. Mat gives suggestions on memory leaks. You can also choose Top Consumers to view the largest object report. Questions related to threads can be analyzed by selecting thread overview. In addition, choose the Histogram class overview to analyze it yourself slowly. You can search for related tutorials on mat.

image

In daily development, memory leaks in code are relatively common and relatively hidden, requiring developers to pay more attention to details. For example, each request is a new object, resulting in a large number of repeated object creation; file stream operations but not closed properly; manual improper trigger gc; unreasonable ByteBuffer cache allocation, etc., will cause code OOM.

On the other hand, we can specify -XX:+HeapDumpOnOutOfMemoryError in the startup parameters to save the dump file in OOM.

4. GC issues and threads

The gc problem not only affects the cpu but also affects the memory, the troubleshooting ideas are also the same. Generally, use jstat first to check the generational changes, such as whether the number of youngGC or fullGC is too much; whether the increase of indicators such as EU and OU is abnormal, etc.

If there are too many threads and the gc is not in time, it will also cause oom, most of which are the previously mentioned “unable to create new native thread”. In addition to detailed analysis of the dump file by jstack, we generally look at the overall thread first, through pstreee -p pid |wc -l.

image

Or you can directly view the number of /proc/pid/task as the number of threads.

image

Off-heap memory

If you encounter an out-of-heap memory overflow, that would be really unfortunate. First of all, the performance of the off-heap memory overflow is that the physical resident memory grows fast. If an error is reported, it depends on the usage method. If it is caused by using Netty, an OutOfDirectMemoryError error may appear in the error log. If it is DirectByteBuffer directly, it will report OutOfMemoryError: Direct buffer memory.

Out-of-heap memory overflow is often related to the use of NIO. Generally, we first use pmap to view the memory occupied by the process pmap -x pid | sort -rn -k3 | head -30, this section means to view the first 30 of the corresponding pid in reverse order Large memory segment. Here you can run the command again after a while to see the memory growth, or where the memory segments are suspicious compared to normal machines.

image

If we are sure that there is a suspicious memory terminal, we need to analyze gdb through gdb --batch --pid {pid} -ex "dump memory filename.dump {memory starting address} {memory starting address + memory block size}"

image

After obtaining the dump file, you can use heaxdump to view it hexdump -C filename | less, but most of what you see are binary garbled characters.

NMT is a new HotSpot feature introduced by Java7U40. With the jcmd command, we can see the specific memory composition. Need to add -XX:NativeMemoryTracking=summary or -XX:NativeMemoryTracking=detail to the startup parameters, there will be a slight performance loss.

Generally, for the situation where the off-heap memory grows slowly until it explodes, a baseline jcmd pid VM.native_memory baseline can be set first.

image

Then wait for a period of time to see the memory growth, and do a summary or detail level diff through jcmd pid VM.native_memory detail.diff(summary.diff).

image

image

You can see that the memory analyzed by jcmd is very detailed, including the heap, thread, and gc (so the other memory exceptions mentioned above can actually be analyzed by nmt). Here we focus on the memory growth of the internal memory, if the increase is very obvious If that is the case, there is a problem.

At the detail level, there will also be the growth of specific memory segments, as shown in the figure below.

image

In addition, at the system level, we can also use the strace command to monitor memory allocation strace -f -e "brk, mmap, munmap" -p pid, the memory allocation information here mainly includes pid and memory address.

image

But in fact, it is difficult to locate the specific problem with the above operations. The key is to look at the error log stack, find the suspicious object, figure out its recovery mechanism, and then analyze the corresponding object. For example, if DirectByteBuffer allocates memory, full GC or manual system.gc is required to recycle it (so it is best not to use -XX:+DisableExplicitGC). In fact, we can track the memory of the DirectByteBuffer object, and manually trigger fullGC through jmap -histo:live pid to see if the off-heap memory has been reclaimed. If it is reclaimed, there is a high probability that the off-heap memory itself is allocated too small, which can be adjusted by -XX:MaxDirectMemorySize. If there is no change, then use jmap to analyze the objects that cannot be gc and the reference relationship with DirectByteBuffer.

GC issues

In-heap memory leaks are always accompanied by GC exceptions. However, GC issues are not only related to memory issues, but may also cause a series of complications such as CPU load and network issues. They are only relatively closely related to memory, so we will separately summarize GC related issues here.

In the cpu chapter, we introduced the use of jstat to obtain the current GC generational change information. More often, we use the GC log to troubleshoot problems, adding -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps to the startup parameters to turn on the GC log.

The meaning of common Young GC and Full GC logs will not be repeated here.

Regarding the gc log, we can roughly infer whether youngGC and fullGC are too frequent or take too long, so as to prescribe the right medicine. We will analyze the G1 garbage collector below. It is also recommended that you use G1-XX:+UseG1GC.

youngGC too frequent

YoungGC is frequent and usually has many short-period small objects. First consider whether the Eden area/Cenozoic setting is too small, and see if the problem can be solved by adjusting parameter settings such as -Xmn and -XX:SurvivorRatio. If the parameters are normal, but the young gc frequency is still too high, you need to use Jmap and MAT to further investigate the dump file.

youngGC takes too long

The problem of excessive time-consuming depends on which part of the GC log is time-consuming. Taking the G1 log as an example, you can focus on Root Scanning, Object Copy, Ref Proc and other stages. Ref Proc takes a long time, so we must pay attention to referencing related objects. Root Scanning takes a long time, so we must pay attention to the number of threads and cross-generation references. Object Copy needs to pay attention to the object life cycle. And time-consuming analysis requires a horizontal comparison, that is, time-consuming comparison with other projects or normal time periods. For example, if the Root Scanning in the figure increases more than the normal time period, it means that there are too many threads.

image

Trigger fullGC

G1 is more of mixedGC, but mixedGC can be investigated in the same way as youngGC. When fullGC is triggered, there will usually be problems. G1 will degenerate and use the Serial collector to complete the garbage clean-up work. The pause time reaches the second level, which can be said to be half kneeling.

The reasons for fullGC may include the following, as well as some ideas for parameter adjustment:

  • Concurrent phase failure: In the concurrent marking phase, the old generation is filled up before MixGC, then G1 will give up the marking cycle at this time. In this case, you may need to increase the heap size, or adjust the number of concurrent marking threads-XX: ConcGCThreads;

  • Promotion failure: There is not enough memory for the survival/promotion object during GC, so Full GC is triggered. At this time, you can increase the percentage of reserved memory through -XX:G1ReservePercent, reduce -XX:InitiatingHeapOccupancyPercent to start the mark in advance, and -XX:ConcGCThreads to increase the number of marked threads is also possible;

  • Large object allocation failure: large objects cannot find a suitable region space for allocation, and fullGC will be performed. In this case, you can increase the memory or increase -XX:G1HeapRegionSize;

  • The program actively executes System.gc(): Don't just write it casually.

In addition, we can configure -XX:HeapDumpPath=/xxx/dump.hprof in the startup parameters to dump fullGC related files, and use jinfo to dump before and after gc

jinfo -flag +HeapDumpBeforeFullGC pid 

jinfo -flag +HeapDumpAfterFullGC pid

In this way, two dump files are obtained. After the comparison, the main focus is on the problem objects dropped by gc to locate the problem.

 

 

Guess you like

Origin blog.csdn.net/weixin_42073629/article/details/115273090