CPU surge? These 3 major scenarios will help you position yourself accurately

1 Commonly used load analysis methods

High CPU, high Load

  1. Use  top the command to find the PID of the process with the highest CPU usage;

  2. By top -Hp PIDfinding the thread TID that occupies the highest CPU;

  3. For javaprograms, use jstackprint thread stack information;

  4. By printf %x tidprinting out the hex of the most CPU-consuming thread;

Low CPU, high Load

The reason can be summarized in one sentence: there are too many processes waiting for disk I/O to be completed, resulting in an excessively large process queue length, but there are very few processes running on the CPU, which reflects excessive load and low CPU usage.

  • topCheck the CPU waiting IO time through commands, that is %wa;

  • By iostat -d -x -m 1 10checking the disk IO status; (installation command  yum install -y sysstat)

  • By sar -n DEV 1 10checking the network IO situation;

  • Use the following command to find programs occupying IO;

ps -e -L h o state,cmd  | awk '{if($1=="R"||$1=="D"){print $0}}' | sort | uniq -c | sort -k 1nr

2 Analysis of high CPU and high load situations

  • Use vmstat to view CPU load at system latitude;

  • Use to  top view CPU load at process latitude;

2.1 Use vmstat to view the CPU load of the system latitude

You can use vmstat to view the usage of CPU resources from the system dimension.

Format: vmstat -n 1 -n 1 Indicates that the results are refreshed once a second

[root@VM-1-14-centos ~]# vmstat -n 1procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 1  0      0 250304 163472 2154300    0    0     1    16    0    4  1  0 98  0  0 0  0      0 250412 163472 2154332    0    0     0     0  937 1439  1  1 99  0  0 0  0      0 250428 163472 2154332    0    0     0     4  980 1329  0  0 100  0  0 0  0      0 250444 163472 2154332    0    0     0     0  854 1227  0  0 99  0  0 0  0      0 250444 163472 2154332    0    0     0    68  832 1284  0  1 99  1  0 0  0      0 250016 163472 2154332    0    0     0     0  929 1389  1  1 99  0  0

Description of the main data columns in the returned results:

  • r : Indicates the thread waiting for processing by the CPU in the system. Since the CPU can only handle one thread at a time, a larger number usually means a slower system.

  • b : Indicates a blocked process. I won’t say much about this. Everyone knows that the process is blocked.

  • us : User CPU time. I was once on a server that frequently performed encryption and decryption. I could see that us was close to 100 and the r run queue reached 80 (the machine was doing stress testing and its performance was poor).

  • sy : System CPU time. If it is too high, it means that the system call time is long, such as frequent IO operations.

  • wa : The percentage of CPU time consumed by IO waiting. When the value is high, it indicates that IO waiting is serious. This may be caused by a large number of random accesses on the disk, or it may be a bottleneck in disk performance.

  • id : Percentage of CPU time in idle state. If the value remains 0 and sy is twice us, it usually indicates that the system is facing a shortage of CPU resources.

Frequently asked questions and solutions:

  • If r is often greater than 4 and id is often less than 40, it means that the CPU is under heavy load.

  • If pi and po are not equal to 0 for a long time, it means insufficient memory.

  • If disk is often not equal to 0, and the queue in b is greater than 3, it indicates poor IO performance.

2.2 Use top to view the CPU load of process dimensions

You can use top to view the usage of resources such as CPU and memory from the process perspective.

top - 19:49:59 up 36 days, 23:15,  3 users,  load average: 0.11, 0.04, 0.05Tasks: 133 total,   1 running, 131 sleeping,   0 stopped,   1 zombie%Cpu(s):  3.1 us,  3.1 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 stKiB Mem :  3880188 total,   241648 free,  1320424 used,  2318116 buff/cacheKiB Swap:        0 total,        0 free,        0 used.  2209356 avail Mem   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                1793 mysql     20   0 1608796 236708   9840 S   6.7  6.1  83:36.23 /usr/sbin/mysqld                                        1 root      20   0  125636   3920   2444 S   0.0  0.1   4:34.13 /usr/lib/systemd/systemd                                     2 root      20   0       0      0      0 S   0.0  0.0   0:00.90 [kthreadd]                                                                                                4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/0:0H]                                                                                            6 root      20   0       0      0      0 S   0.0  0.0   0:15.46 [ksoftirqd/0]                                                                                             7 root      rt   0       0      0      0 S   0.0  0.0   0:12.02 [migration/0]

The third line on the default interface will display the current overall usage of CPU resources, and the resource usage of each process will be displayed below.

You can directly enter the upper and lower letters P in the interface to sort the monitoring results in reverse order of CPU usage, thereby locating the process that occupies a higher CPU in the system. Finally, further troubleshoot and analyze the corresponding process based on the system log and the program's own related logs to determine the reason for its excessive CPU usage.

3. Low CPU, high Load

Problem Description

There is no business program running in the Linux system. Observed through top, as shown in the figure below, the CPU is very idle, but the load average is very high:

problem analysis

Low CPU and high load means that there are too many processes waiting for disk I/O to be completed, which will cause the queue length to be too large, which reflects that the load is too large, but in fact, the CPU is allocated to perform other tasks at this time or Free, the specific scenarios are as follows:

Scenario 1: Too many disk read and write requests will lead to a lot of I/O waiting

As mentioned above, the working efficiency of the CPU is higher than that of the disk, and the process running on the CPU needs to access the disk file. At this time, the CPU will initiate a request to the kernel to call the file, and let the kernel go to the disk to fetch the file. At this time, it will switch to other processes. or idle, the task will transition to an uninterruptible sleep state. When there are too many such read and write requests, there will be too many processes in uninterruptible sleep state, resulting in high load and low CPU.

Scenario 2: There are statements without indexes or deadlocks in MySQL.

We all know that MySQL data is stored on the hard disk. If you need to perform SQL queries, you need to load the data from the disk into the memory first. When the data is particularly large, if the SQL statement executed does not have an index, it will cause the number of rows in the scan table to be too large, causing I/O blocking, or there will be a deadlock in the statement, which will also cause I/O blocking, leading to Too many sleep processes cannot be interrupted, resulting in excessive load. The specific solution can be to run the show full processlist command in MySQL to check the thread waiting status, and take out the statements for optimization.

Scenario 3: The external hard disk fails. It is common to have NFS installed, but the NFS server fails.

For example, if our system mounts an external hard disk such as NFS shared storage, there will often be a large number of read and write requests to access files stored in NFS. If the NFS Server fails at this time, the process will be unable to obtain resources for read and write requests. As a result, the process is always in an uninterruptible state, causing a high load.

Solution

  • Load average is an evaluation of the CPU load. The higher the value, the longer the task queue and the more tasks waiting to be executed.

  • When this happens, it may be caused by a zombie process. You can check whether there is a D state process through the command ps -axjf.

  • D state refers to an uninterruptible sleep state. A process in this state cannot be killed or exit by itself. It can only be solved by restoring the resources it depends on or restarting the system.

Processes waiting for I/O are in the uninterruptible sleep or D state; by giving this information, we can simply find the process in the wait state.

ps -e -L h o state,cmd  | awk '{if($1=="R"||$1=="D"){print $0}}' | sort | uniq -c | sort -k 1nr

Author: Honest1y
Source: https://juejin.cn/post/7016127914454286367

Guess you like

Origin blog.csdn.net/LinkSLA/article/details/132467476