The problem that the Java program causes the server CPU usage to be too high

Original address: https://www.jianshu.com/p/3667157d63bb

 

1. Fault phenomenon

The customer service colleagues reported that the platform system is running slowly, and the web page is seriously stuck. After restarting the system several times, the problem still exists. Use the top command to check the server status and find that the CPU usage is too high.

2. The problem of high CPU usage is located

2.1. The process of locating the problem

Use the top command to check the resource usage. It is found that the process with pid of 14063 occupies a lot of CPU resources. The CPU usage is as high as 776.1%, and the memory usage is also up to 29.8%.


[ylp@ylp-web-01 ~]$ top
top - 14:51:10 up 233 days, 11:40,  7 users,  load average: 6.85, 5.62, 3.97
Tasks: 192 total,   2 running, 190 sleeping,   0 stopped,   0 zombie
%Cpu(s): 97.3 us,  0.3 sy,  0.0 ni,  2.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16268652 total,  5114392 free,  6907028 used,  4247232 buff/cache
KiB Swap:  4063228 total,  3989708 free,    73520 used.  8751512 avail Mem 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                               

14063 ylp       20   0 9260488 4.627g  11976 S 776.1 29.8 117:41.66 java    

2.2, locate the problem thread

Use the ps -mp pid -o THREAD,tid,time command to view the thread status of the process, and find that the process has a high occupancy rate of multiple threads


[ylp@ylp-web-01 ~]$ ps -mp 14063 -o THREAD,tid,time
USER     %CPU PRI SCNT WCHAN  USER SYSTEM   TID     TIME
ylp       361   -    - -         -      -     - 02:05:58
ylp       0.0  19    - futex_    -      - 14063 00:00:00
ylp       0.0  19    - poll_s    -      - 14064 00:00:00
ylp      44.5  19    - -         -      - 14065 00:15:30
ylp      44.5  19    - -         -      - 14066 00:15:30
ylp      44.4  19    - -         -      - 14067 00:15:29
ylp      44.5  19    - -         -      - 14068 00:15:30
ylp      44.5  19    - -         -      - 14069 00:15:30
ylp      44.5  19    - -         -      - 14070 00:15:30
ylp      44.5  19    - -         -      - 14071 00:15:30
ylp      44.6  19    - -         -      - 14072 00:15:32
ylp       2.2  19    - futex_    -      - 14073 00:00:46
ylp       0.0  19    - futex_    -      - 14074 00:00:00
ylp       0.0  19    - futex_    -      - 14075 00:00:00
ylp       0.0  19    - futex_    -      - 14076 00:00:00
ylp       0.7  19    - futex_    -      - 14077 00:00:15

It can be seen from the output information that the CPU usage of threads between 14065 and 14072 is very high.

2.3. View the problem thread stack

Select the thread whose TID is 14065, check the stack of the thread, first convert the thread id to hexadecimal, and use the printf "%x\n" tid command to convert


[ylp@ylp-web-01 ~]$ printf "%x\n" 14065
36f1

Then use the jstack command to print thread stack information, command format: jstack pid |grep tid -A 30


[ylp@ylp-web-01 ~]$ jstack 14063 |grep 36f1 -A 30
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fa35001e800 nid=0x36f1 runnable 
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fa350020800 nid=0x36f2 runnable 
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fa350022800 nid=0x36f3 runnable 
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fa350024000 nid=0x36f4 runnable 
"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fa350026000 nid=0x36f5 runnable 
"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fa350028000 nid=0x36f6 runnable 
"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fa350029800 nid=0x36f7 runnable 
"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fa35002b800 nid=0x36f8 runnable 
"VM Periodic Task Thread" prio=10 tid=0x00007fa3500a8800 nid=0x3700 waiting on condition 

JNI global references: 392

As can be seen from the output information, this thread is the gc thread of the JVM. At this point, it can be basically determined that the gc thread continues to run due to insufficient memory or memory leak, resulting in high CPU usage.
So the next thing we're looking for is the memory problem.

3. Memory problem location

3.1. Use the jstat -gcutil command to view the memory of the process


[ylp@ylp-web-01 ~]$ jstat -gcutil 14063 2000 10

  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   0.00 100.00  99.99  26.31     42   21.917   218 1484.830 1506.747
  0.00   0.00 100.00  99.99  26.31     42   21.917   218 1484.830 1506.747
  0.00   0.00 100.00  99.99  26.31     42   21.917   219 1496.567 1518.484
  0.00   0.00 100.00  99.99  26.31     42   21.917   219 1496.567 1518.484
  0.00   0.00 100.00  99.99  26.31     42   21.917   219 1496.567 1518.484
  0.00   0.00 100.00  99.99  26.31     42   21.917   219 1496.567 1518.484
  0.00   0.00 100.00  99.99  26.31     42   21.917   219 1496.567 1518.484
  0.00   0.00 100.00  99.99  26.31     42   21.917   220 1505.439 1527.355
  0.00   0.00 100.00  99.99  26.31     42   21.917   220 1505.439 1527.355
  0.00   0.00 100.00  99.99  26.31     42   21.917   220 1505.439 1527.355

It can be seen from the output information that the memory in the Eden area occupies 100%, the memory in the Old area occupies 99.99%, the number of Full GCs is as high as 220, and the Full GC is frequent and the duration of the Full GC is also very long. 6.8 seconds (1505.439/220). According to this information, it can be basically determined that there is a problem in the program code, and there may be places where objects are created unreasonably.

3.2. Analysis stack

Use the jstack command to view the stack of the process


[ylp@ylp-web-01 ~]$ jstack 14063 >>jstack.out

After getting the jstack.out file from the server to the local, use the editor to find the relevant information with the project directory and the thread status is RUNABLE. It can be seen from the figure that the 447 line of the ActivityUtil.java class is using the HashMap.put() method

Paste_Image.png

3.3, code positioning

Open the project and find line 477 of the ActivityUtil class. The code is as follows:

Paste_Image.png

After finding the relevant colleagues to understand, this code will obtain the configuration from the database, and loop according to the value of retain in the database, in the loop, the HashMap will always be put.

Query the configuration in the database and find that the number of remains is huge

Paste_Image.png

At this point, the problem is located.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324393464&siteId=291194637