Five steps to troubleshoot Linux CPU usage 100%

This article is transcoded by Jianyue SimpRead , and the original address is www.toutiao.com

Have you ever encountered this situation? The Linux server CPU usage reaches 100% and remains high, seriously affecting the normal use of the business system. At this time, the leader asks you to troubleshoot the cause of the problem. If not

Have you ever encountered this situation? The Linux server CPU usage reaches 100% and remains high, seriously affecting the normal use of the business system. At this time, the leader asks you to troubleshoot the cause of the problem. If you don’t know how to start, just It's a bit embarrassing.

The troubleshooting idea is divided into five steps. Follow the steps as long as you have the skills. Correct troubleshooting methods can help quickly locate and solve problems.

  1. The top command locates the application process pid
  2. top -Hp [pid] locates the thread tid corresponding to the application process
  3. printf “%x\n” [tid] Convert tid to hexadecimal
  4. jstack [pid] | grep -A 10 [tid hexadecimal] print stack information
  5. Analyze issues based on stack information

Let’s take a practical example to deepen our impression. First create a test project code, package it and upload it to the test server, and start the service (java -jar demo-0.0.1-SNAPSHOT.jar).

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
/**
 * CPU占用率测试方法
 * @author: 八零后琐话
 * @date: 2023-06-03
 */
@RequestMapping("/api")
@RestController
public class CpuTestController {
    @GetMapping("/cpu/{count}")
    public long cpuRunning(@PathVariable("count") long count) {
        long result = 0;
        for (int i = 0; i < count; i++) {
            result++;
        }
        return result;
    }
}

Call the interface : curl
http://localhost:9999/api/cpu/10000000000, and find that the CPU spikes to 100% in a short while. Next, we will follow the five-step method to find out.

Step 1: Find the process pid that consumes the most CPU

// 执行 top 命令,按shift + p 组合键,按照CPU占用率排序
> top 

From the figure, we can see that the process pid of 11168 occupies the highest CPU, directly 100%.

Step 2: Find the thread tid that consumes the most CPU

// 执行 top -Hp [pid] 定位应用进程对应的线程 tid
// 按shift + p 组合键,按照CPU占用率排序
> top -Hp 11168

From the figure, we can see that thread tid 11196 occupies the highest CPU, reaching 99.9%.

Step 3: Convert thread pid to hexadecimal

// printf "%x\n" [tid]  将tid转换为十六进制
> printf "%x\n" 11196
2bbc

Step 4: View the stack information of the thread

// jstack [pid] | grep -A 10 [tid的十六进制]  打印堆栈信息
> jstack 11168 | grep -A 10 2bbc

It is not difficult for us to find out that it is line 24 of CpuTestController that caused the problem.

Step 5: Analyze the problem based on stack information

Find the corresponding code and take a look. Sure enough, the problem has been located. Let’s optimize the code logic~~~~

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-PvkBGnPw-1687926153794)(https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0 /b7bf2efb446d43f9aa3dc885f8cd1719~noop.image?_iz=58558&from=article.pc_detail&x-expires=1688529537&x-signature=7X8yi%2FSNpG2f1EVrCdbEZwWU7ig%3D)]

Of course, in addition to using the above basic methods, there are many tools that can be used directly, such as Alibaba's Arthas.

Arthas official website:
https://arthas.aliyun.com/doc/

Arthas is an online monitoring and diagnosis product. It can view the status information of application load, memory, gc, and threads in real time from a global perspective, and can diagnose business problems without modifying the application code, including viewing the discrepancies in method calls. Parameters, exceptions, monitoring method execution time, class loading information, etc. greatly improve the efficiency of online troubleshooting.

For online problem locating, every second counts, so during the coding process, in addition to delivering business functions, we also pursue code quality and write more efficient and elegant code.

Guess you like

Origin blog.csdn.net/caidingnu/article/details/131434245