CPU usage related indicators
- user (usually abbreviated as us), represents the user mode CPU time. Note that it does not include the nice time below, but includes the guest time.
- nice (usually abbreviated as ni), stands for low-priority user mode CPU time, that is, the CPU time when the nice value of the process is adjusted to between 1-19. Note here that the range of possible values for nice is -20 to 19. The larger the value, the lower the priority.
- system (often abbreviated as sys), stands for kernel mode CPU time.
- idle (usually abbreviated as id), stands for idle time. Note that it does not include time waiting for I/O (iowait).
- iowait (often abbreviated as wa) stands for CPU time waiting for I/O.
- irq (often abbreviated as hi), stands for CPU time processing hard interrupts.
- softirq (usually abbreviated as si), represents the CPU time for processing soft interrupts.
- steal (usually abbreviated as st) represents the CPU time occupied by other virtual machines when the system is running in a virtual machine.
- Guest (usually abbreviated as guest) represents the time of running other operating systems through virtualization, that is, the CPU time of running a virtual machine.
- guest_nice (often abbreviated as gnice), which represents the time to run the virtual machine with low priority.
How to check CPU usage
top
top - 15:45:59 up 364 days, 20:43, 0 users, load average: 0.00, 0.01, 0.00
Tasks: 139 total, 1 running, 95 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.8 us, 1.8 sy, 0.0 ni, 96.0 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 3514764 total, 179812 free, 1061072 used, 2273880 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 2100148 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9951 ubuntu 20 0 989840 108760 36424 S 1.3 3.1 0:23.99 node
30257 root 20 0 588648 20356 4840 S 1.0 0.6 315:41.03 barad_agent
11399 root 20 0 1114904 151668 21872 S 0.7 4.3 535:40.93 YDService
9995 ubuntu 20 0 1039160 68076 33532 S 0.3 1.9 0:55.91 node
26555 ubuntu 20 0 108500 4476 3144 S 0.3 0.1 0:01.23 sshd
26615 ubuntu 20 0 978144 89548 38244 S 0.3 2.5 0:08.44 node
1 root 20 0 225544 7596 4920 S 0.0 0.2 19:35.32 systemd
The third line %Cpu is the CPU usage of the system.
However, it should be noted that top displays the average value of all CPUs by default. At this time, you only need to press the number 1 to switch to the usage rate of each CPU.
Continue to look down, after the blank line is the real-time information of the process, each process has a %CPU column, indicating the CPU usage of the process. It is the sum of user mode and kernel mode CPU usage, including CPU used by process user space, kernel space CPU executed through system calls, and CPU waiting to run in the ready queue. In a virtualized environment, it also includes the CPU used to run the virtual machine.
Here we can find that top does not subdivide the user-mode CPU and kernel-mode CPU of the process. So how do you view the details of each process? Use pidstat, which is a tool that specifically analyzes the CPU usage of each process.
查看每个进程 CPU 使用率
可以用 pidstat 命令,查看进程 CPU 使用率,包括:
-
用户态 CPU 使用率 (%usr);
-
内核态 CPU 使用率(%system);
-
运行虚拟机 CPU 使用率(%guest);
-
等待 CPU 使用率(%wait);
-
总的 CPU 使用率(%CPU)
最后的 Average 部分,还计算了 5 组数据的平均值。
# 每隔1秒输出一组数据,共输出5组
pidstat 1 5
Linux 4.15.0-180-generic (VM-0-11-ubuntu) 08/04/2023 _x86_64_ (2 CPU)
03:48:38 PM UID PID %usr %system %guest %wait %CPU CPU Command
03:48:39 PM 500 9995 0.00 0.99 0.00 0.00 0.99 0 node
03:48:39 PM 0 11399 0.00 0.99 0.00 0.00 0.99 0 YDService
03:48:39 PM 0 11521 0.00 0.99 0.00 0.00 0.99 1 sh
03:48:39 PM 0 30257 0.00 0.99 0.00 0.00 0.99 0 barad_agent
03:48:39 PM UID PID %usr %system %guest %wait %CPU CPU Command
03:48:40 PM 500 9951 0.00 1.00 0.00 0.00 1.00 1 node
03:48:40 PM 0 11399 0.00 1.00 0.00 0.00 1.00 0 YDService
03:48:40 PM 500 16640 1.00 0.00 0.00 0.00 1.00 1 pidstat
03:48:40 PM 0 30257 1.00 0.00 0.00 0.00 1.00 0 barad_agent
03:48:40 PM UID PID %usr %system %guest %wait %CPU CPU Command
03:48:41 PM 500 9995 1.00 0.00 0.00 0.00 1.00 0 node
03:48:41 PM 500 16640 0.00 1.00 0.00 0.00 1.00 1 pidstat
03:48:41 PM UID PID %usr %system %guest %wait %CPU CPU Command
03:48:42 PM 111 8846 0.00 1.00 0.00 0.00 1.00 0 ntpd
03:48:42 PM 0 11399 1.00 0.00 0.00 0.00 1.00 0 YDService
03:48:42 PM UID PID %usr %system %guest %wait %CPU CPU Command
03:48:43 PM 0 7059 0.00 1.00 0.00 0.00 1.00 1 YDLive
03:48:43 PM 500 9995 0.00 1.00 0.00 0.00 1.00 0 node
03:48:43 PM 0 11399 0.00 1.00 0.00 0.00 1.00 0 YDService
03:48:43 PM 500 26615 1.00 0.00 0.00 0.00 1.00 0 node
Average: UID PID %usr %system %guest %wait %CPU CPU Command
Average: 0 7059 0.00 0.20 0.00 0.00 0.20 - YDLive
Average: 111 8846 0.00 0.20 0.00 0.00 0.20 - ntpd
Average: 500 9951 0.00 0.20 0.00 0.00 0.20 - node
Average: 500 9995 0.20 0.40 0.00 0.00 0.60 - node
Average: 0 11399 0.20 0.60 0.00 0.00 0.80 - YDService
Average: 0 11521 0.00 0.20 0.00 0.00 0.20 - sh
Average: 500 16640 0.20 0.20 0.00 0.00 0.40 - pidstat
Average: 500 26615 0.20 0.00 0.00 0.20 0.20 - node
Average: 0 30257 0.20 0.20 0.00 0.00 0.40 - barad_agent
占用 CPU 是哪个函数
使用系统的 perf 工具。
使用 perf 分析 CPU 性能问题,两种最常见用法。
perf top
第一种常见用法是 perf top,类似于 top,它能够实时显示占用 CPU 时钟最多的函数或者指令,因此可以用来查找热点函数,使用界面如下所示:
sudo perf top
Samples: 3K of event 'cpu-clock', Event count (approx.): 624550087
Overhead Shared Object Symbol
4.77% [kernel] [k] _raw_spin_unlock_irqrestore
4.10% perf [.] __symbols__insert
3.19% perf [.] d_print_comp_inner
2.92% perf [.] rb_next
2.48% [kernel] [k] __softirqentry_text_start
2.45% [kernel] [k] __do_page_fault
2.12% [kernel] [k] finish_task_switch
1.62% [kernel] [k] do_syscall_64
1.58% [kernel] [k] clear_page_erms
1.33% [kernel] [k] unmap_page_range
1.33% [kernel] [k] flush_tlb_mm_range
1.20% perf [.] d_print_comp
1.03% [kernel] [k] filemap_map_pages
0.99% [kernel] [k] copy_pte_range
0.94% [kernel] [k] kallsyms_expand_symbol.constprop.1
0.94% libc-2.27.so [.] cfree
第一行包含三个数据,分别是采样数(Samples)、事件类型(event)和事件总数量(Event count)。比如这个例子中,perf 总共采集了 3k 个 CPU 时钟事件,而总事件数则为 624550087。
另外,采样数需要我们特别注意。如果采样数过少(比如只有十几个),那下面的排序和百分比就没什么实际参考价值了。
再往下看是一个表格式样的数据,每一行包含四列,分别是:
-
第一列 Overhead ,是该符号的性能事件在所有采样中的比例,用百分比来表示。
-
第二列 Shared ,是该函数或指令所在的动态共享对象(Dynamic Shared Object),如内核、进程名、动态链接库名、内核模块名等。
-
第三列 Object ,是动态共享对象的类型。比如 [.] 表示用户空间的可执行程序、或者动态链接库,而 [k] 则表示内核空间。
-
最后一列 Symbol 是符号名,也就是函数名。当函数名未知时,用十六进制的地址来表示。
还是以上面的输出为例,我们可以看到,占用 CPU 时钟最多的是 perf 工具自身,不过它的比例也只有 4.1%,说明系统并没有 CPU 性能问题。
perf record & perf report
第二种常见用法,也就是 perf record 和 perf report。 perf top 虽然实时展示了系统的性能信息,但它的缺点是并不保存数据,也就无法用于离线或者后续的分析。而 perf record 则提供了保存数据的功能,保存后的数据,需要你用 perf report 解析展示。
sudo perf record
^C[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 1.373 MB perf.data (19206 samples) ]
sudo perf report
Samples: 19K of event 'cpu-clock', Event count (approx.): 4801500000
Overhead Command Shared Object Symbol
96.64% swapper [kernel.kallsyms] [k] native_safe_halt
0.14% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.11% swapper [kernel.kallsyms] [k] __softirqentry_text_start
0.09% swapper [kernel.kallsyms] [k] finish_task_switch
0.07% barad_agent python [.] PyEval_EvalFrameEx
0.05% barad_agent [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.04% YDService [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.04% barad_agent [kernel.kallsyms] [k] __do_page_fault
0.04% barad_agent [kernel.kallsyms] [k] copy_page
0.03% barad_agent [kernel.kallsyms] [k] copy_pte_range
0.03% barad_agent [kernel.kallsyms] [k] unmap_page_range
0.03% barad_agent python [.] lookdict_string
0.03% node [kernel.kallsyms] [k] copy_pte_range
在实际使用中,我们还经常为 perf top 和 perf record 加上 -g 参数,开启调用关系的采样,方便我们根据调用链来分析性能问题。
总结
CPU 使用率是最直观和最常用的系统性能指标,更是我们在排查性能问题时,通常会关注的第一个指标。所以我们更要熟悉它的含义,尤其要弄清楚用户(%user)、Nice(%nice)、系统(%system) 、等待 I/O(%iowait) 、中断(%irq)以及软中断(%softirq)这几种不同 CPU 的使用率。比如说:
-
用户 CPU 和 Nice CPU 高,说明用户态进程占用了较多的 CPU,所以应该着重排查进程的性能问题。
-
系统 CPU 高,说明内核态占用了较多的 CPU,所以应该着重排查内核线程或者系统调用的性能问题。
-
I/O 等待 CPU 高,说明等待 I/O 的时间比较长,所以应该着重排查系统存储是不是出现了 I/O 问题。
-
软中断和硬中断高,说明软中断或硬中断的处理程序占用了较多的 CPU,所以应该着重排查内核中的中断服务程序。
碰到 CPU 使用率升高的问题,你可以借助 top、pidstat 等工具,确认引发 CPU 性能问题的来源;再使用 perf 等工具,排查出引起性能问题的具体函数。