[Online error checking] The CPU of the Java service line is 100% due to the endless loop, and the code positioning is checked.

A server on the line, the Alibaba Cloud CPU alarm soared to 100%, because the business volume is not very large, and the CPU is usually very stable.
According to the direct, the first reaction is that there is a problem with the code writing of a little brother, resulting in an infinite loop Always use CPU.
The project operating environment is Tomcat8, and the service is directly packaged as a war packet and thrown into it to run.

Without further ado, enter the topic,

1. Log in to the server
ssh ...

2. View the process occupies resources
Enter the command top to view the process occupancy resources. Here you can see the process with the highest occupancy. Generally, the column of CPU will show that the occupancy exceeds 100%. Anyway, stare at the highest occupancy and get its PID.

Insert picture description here

3. Check the thread under the process. We have found the process, so we have to go in and see which thread in it is eating the resource
top -H -p 17123 (PID)

Insert picture description here

4. Find the highest resource occupancy and change the thread id to hexadecimal
printf "% x \ n" 15568

Insert picture description here

5. Use jstack to view the running position of the thread code
jstack 16466 (PID) | grep 47f8 (thread ID hex) -A 100

6. Locate the line of code in the endless loop, and conduct a code review to see why the endless loop occurs. Of course, we found the problem at the moment. The service should be stopped first to avoid non-stop resource consumption to ensure that online services quickly return to normal Run, and then quickly solve the problem that caused a problem. After testing, restart the previous version of the service.

Insert picture description here

Because it was a production accident at the time, the screenshot above was simulated by the author privately. At that time, because the project had an algorithm, it was necessary to take out a set of numbers that meet the rules based on a specific number.
At that time, a colleague wrote a method, and while loop to call until you get a satisfactory result and jump out of the loop,

The risk is that this number is entered by the user. You never know what kind of number the user will enter, resulting in this number may never produce a suitable result, so it has been in a while loop,

Later, the author temporarily fixed this bug by adding a counter. The initial number of this counter is 100,000, and the cycle is subtracted by 1 at a time. That is, if the appropriate number is not obtained after cycling for 100,000 times, it will jump out and return to friendly The message prompts the user that the operation failed.

The main content of this article is to troubleshoot this abnormal problem of CPU soaring, not to discuss whether there is a better algorithm to solve the business scenario at that time ~ Thank you for reading

The following are the problems you may encounter when executing the above command

1. Unable to open socket file: target process not responding or HotSpot VM not loaded
Someone on the Internet says it is the problem of tmp. The
actual solution here: the process behind jstack is not the current user root, but tomcat, you need to switch to tomcat, execute su tomcat.

2. This account is currently not available
This error may occur when switching to tomcat users. At this moment, execute the following command
cat / etc / passwd | grep tomcat
will find it. The shell script here is / sbin / nologin
so vi / etc / passwd Change / sbin / nologin to / bin / bash, save and exit and switch to tomcat again, just fine.

Published 38 original articles · praised 17 · visit 9022

Guess you like

Origin blog.csdn.net/cainiao1412/article/details/98885073