JVM: Online service CPU is full, how to troubleshoot (3)

0 Preface

Some time ago, there was a problem of online CPU fullness caused by regular expressions. At the beginning, the problem was not located at the first time. Here I also record it here. At the same time, I also systematically sorted out the troubleshooting ideas and methods of CPU fullness. Provide a reference for future students.

1. The cause of the full CPU problem

We must first understand the reason why the CPU is soaring and full, so that we can check it correctly:

  • Increased concurrency: This type of reason is relatively easy to occur, that is, the sudden increase in concurrency leads to insufficient online server resources and high CPU usage.
  • Functions consume computing resources: This kind of reason is also called computing-intensive tasks, that is, functions that consume more computing resources, such as responsible data processing, image processing, encryption, etc.
  • Circular recursion: This type is caused by irregular writing during development, such as writing an infinite loop, or deep recursive calls, resulting in resource consumption and unable to release threads
  • Resource competition: Competitive calls to the same resource are generated before multiple threads or processes, such as lock competition, connection pool competition, etc.
  • Service failure: The cpu usage is too high due to the failure of third-party components. The probability of this kind of problem is relatively small, because the monitoring service will be deployed in the general production environment, and the component failure will usually be alerted at the first time.
  • Hardware failure: Occupation soars due to hardware failure of the cpu itself

Among the above reasons, what is really easy to occur in our software production is the increase in concurrency, the consumption of computing resources by functions, the recursion of cycles, resource competition and other situations are relatively rare, or when other situations fail, it is no longer a developer. range of intervention. So today we will mainly explain these 4 reasons.

2. Solutions

  • Increased concurrency:

First of all, the soaring CPU caused by this reason is not at the code layer itself, but at the architecture layer. Since the amount of concurrency has increased, it is a better measure to urgently increase server resources or limit the concurrency

  • Functions consume computing resources:

There are two main ways to deal with the soaring CPU caused by this kind of problem. One is to first locate which function occupies a large amount of CPU resources. We will explain how to locate it below; needs to take up so many resources; if the calculation itself is indeed very complicated, then consider increasing server resources

  • Loop recursion:

Infinite loops and excessively deep recursion are what we definitely want to avoid, but how to locate them is a difficult point, which we will explain below. After positioning, it is necessary to optimize the code to avoid this situation

  • Resource competition:

Resource competition should be analyzed according to what resources are actually occupied, such as lock competition, then measures to avoid competition should be taken, such as reducing the use of locks, using optimistic locks, using JUC synchronization components, etc.

3. Locate the CPU full problem

First of all, we must locate the problem of full CPU. In fact, the core is to locate the threads that occupy high CPU and the corresponding code. Like the OOM problem, after locating the code location, the problem can be solved easily

Here, in order to let everyone feel the actual troubleshooting operation, I simulate a code that will cause the CPU to soar, and let's troubleshoot together after running. If you want to follow along, you can download the source code at the following git:

https://gitee.com/wuhanxue/wu_study/tree/master/demo/cpu_oom_demo

1. Run the project

 java -jar cpu_oom_demo-0.0.1-SNAPSHOT.jar

insert image description here

2. Call the interface that will cause the CPU to soar, simulating that the CPU is full

http://192.168.244.14:8080/cpu/build?time=-1

3. topCommand to observe the resource usage of the process

insert image description here

If your server is multi-core, it should be noted that the number of cores is the maximum occupancy. For example, for a 4-core processor, the CPU can reach a maximum of 400%. If the current occupancy is only 100%, it means that it is far from reaching the limit.

Here I am running on a single-core virtual machine, so the CPU usage is up to 100%, and it can be seen that it has soared to 99%. You can see that the pid of the process with the highest CPU usage is 2127

2. If multiple java programs are deployed on a server, you can confirm the service name corresponding to the process ID through jinfoinstructions or instructionsjps

jps -l

insert image description here

3. We use to top -Hpcheck the thread resource usage under the process

top -Hp 2127

insert image description here

Check that the pid of the thread with the highest CPU usage is 2140, and the second is 2141

4. We continue to check the stack log information under this thread. This step can jstackbe realized through instructions, but the thread IDs in the logs printed by this tool are all in hexadecimal, so we need to convert the thread id to hexadecimal

Two ways to realize 10 to 16 hexadecimal:

insert image description here

  • linux instruction conversion
printf '%x\n' 2140

insert image description here

Before using the jstack tool, let's take a look at its parameters:

jstack [options] Process pid
options parameter values:
-F: Force print a stack dump
-l: Print other information about locks
-m: Print stack information including java and native method stacks
-h: Print help information

Directly execute jstack 2127the printing process information and find that there are too many information, it is difficult to capture the information you want first

insert image description here

So let's filter it according to the thread id, and check the stack information of the thread with the largest CPU usage, which grep -A 50indicates the 50 lines of logs after the specified keyword

 jstack 2127 | grep -A 50 85c

insert image description here

From the above log information, we can see that the problem lies in the call of the regular expression, and at the same time the method is positioned as the cpuBuild method, then from then on, we can go to the code location to adjust

insert image description here
Through local testing, we can know that the reason is that the original written regular expression takes up too much cpu. When judging long strings, it takes a long time, and the thread has not been released, resulting in high cpu.

Of course, you can see that we have written an infinite loop here, the purpose is to simply simulate the high concurrent requests of users. If you don’t want to write an infinite loop, you can also use jmeter to make concurrent calls, which can also simulate the effect of full CPU
insert image description here

We can also Threadquery the number of threads through keywords to realize the statistics of concurrency

jstack -l 2127 | grep 'java.lang.Thread.State' | wc -l

insert image description here
If you want to export the stack information to a file for viewing, you can also use jstackthe command to achieve it

jstack 2127 > 2127.log

insert image description here

insert image description here

To solve this kind of CPU problem, after we locate the problem code, it is up to everyone to optimize the code

Summarize

To sum up, we use the jstack tool and other instructions to locate the ideas and steps of the CPU soaring problem, hoping to provide you with help in actual production

Guess you like

Origin blog.csdn.net/qq_24950043/article/details/130172750