table of Contents
1. top
top is the most commonly used tool to view system resource usage, including CPU, memory and other resources.
The main focus here is on CPU resources.
1.1 /proc/loadavg
Load average is taken from /proc/loadavg.
9.53 9.12 8.37 3/889 28165
The first three numbers are the average number of processes in the process queue within 1, 5, and 15 minutes, including running processes + processes ready to be run.
The numerator of the fourth number represents the number of running processes, and the denominator is the total number of processes.
The last number is the ID number of the most recently run process.
Where top takes the first three numbers of /proc/loadavg.
1.2 Use of top
Open top, you can specify the update cycle.
Enter H to open the hidden thread; enter 1 to display the single-core CPU usage.
top -H -b -d 1 -n 200> top.txt, counts once every 1 second, a total of 200 times, displays thread details, and saves it to top.txt.
You also depend on the top sampling sources, /proc/stat and /proc/<pid>/stat. For detailed introduction of these two, please refer to: /proc/stat and /proc/<pid>/stat .
The meaning of the CPU information is as follows:
Us is the meaning of user. It counts user space processes with nice less than or equal to 0, that is, the priority is 100~120.
ni means nice, counting user space processes with nice greater than 0, that is, the priority is 121~139.
sys means system, which counts the running time of the kernel state, excluding interrupts.
id means idle, and several systems are in an idle state.
Wa means iowait, which counts io waiting time.
hi is hardware interrupt, which counts the hardware interrupt time.
si is software interrupt, which counts the time of software interruption.
The last st means steal.
2. perf
" The introduction and use of system-level performance analysis tool perf " has a detailed introduction about the use of perf, here we focus on CPU usage.
Through sudo perf top -s comm, you can view the proportion of current system running processes.
Unlike top, which distinguishes idle, system, and user, the proportion here is the proportion of each process in the total running time.
Record sampling information through sudo perf record, and then through sudo perf report -s comm.
3. sar 、 ksar
Sar is the meaning of System Activity Report, which can be used to observe the current system activity in real time, and it can also generate reports of historical records.
To use sar, you need to install sudo apt install sysstat, and then configure sysstat.
sar is used to record statistical information, and ksar is used to graphically output the recorded information.
The ksar download address is: https://github.com/vlsi/ksar/releases .
sudo gedit /etc/default/sysstat--------------------------------将 ENABLED=“false“ 改为ENABLED=“true“。
sudo gedit /etc/cron.d/sysstat--------------------------------Modify the sar cycle and other configurations.
sudo /etc/init.d/sysstat restart--------------------------------Restart the sar service
/var/log/sysstat/------------------------------------------- -------sar log storage directory
Use sar to record the current statistics from booting to the file sar.txt.
LC_ALL=C sar -A > sar.txt
PS: sar-A is used directly here, which cannot be displayed normally in ksar.
Execute java -jar ksar.jar as follows, then Data->Load from text file... select the saved sar.txt file.
Get the following chart.
You can also record information for a period of time through sar, and specify the sampling period and sampling times.
These commands are preceded by LC_ALL=C and saved to a file, and they can all be displayed graphically in ksar.
sar 1 100---------------------------------------statistics of all cpu in one
sar -P ALL 1 100-----------------------------Including cpu integration and single cpu statistics
sar -B 1 100-----------------------------------paging statistics
sar -b 1 100----------------------------------Block device IO statistics
sar -d 1 100----------------------------------Block device activity statistics
sar -F 1 100--------------------------------- mounted file system statistics
sar -r ALL------------------------------------Display detailed memory usage statistics
sar -S----------------------------------------display swap space usage statistics
sar -w---------------------------------------Display process creation and process switching statistics
sar -W--------------------------------------Display the statistics of swap in and out.
更详细请参考《How To Create sar Graphs With kSar To Identifying Linux Bottlenecks》、《Collect and report Linux System Activity Information with sar》。
4. mpstat
mpstat is Multiprocessor Statistics. When there is no parameter, mpstat displays the average value of all information since the system.
Common usage is as follows, -P ALL monitors all CPUs, and details show specific CPUs; 10 means monitoring every 10 seconds; 20 means monitoring 20 times.
mpstat -P ALL 10 20
The results are as follows:
Linux 4.13.0-36-generic (xxx) 2018年08月13日 _x86_64_ (4 CPU)
11时01分09秒 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
11时01分19秒 all 3.44 6.26 5.15 0.13 0.00 0.20 0.00 0.00 0.00 84.82
11时01分19秒 0 3.09 13.46 3.29 0.00 0.00 0.10 0.00 0.00 0.00 80.06
11时01分19秒 1 4.41 3.11 5.02 0.00 0.00 0.60 0.00 0.00 0.00 86.86
11时01分19秒 2 2.96 0.20 9.29 0.00 0.00 0.10 0.00 0.00 0.00 87.45
11时01分19秒 3 3.32 7.95 3.12 0.50 0.00 0.00 0.00 0.00 0.00 85.11
11时01分19秒 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
11时01分29秒 all 3.65 6.09 5.08 0.00 0.00 0.25 0.00 0.00 0.00 84.93
11时01分29秒 0 3.92 11.07 4.63 0.00 0.00 0.20 0.00 0.00 0.00 80.18
11时01分29秒 1 4.39 1.90 3.49 0.00 0.00 0.80 0.00 0.00 0.00 89.42
11时01分29秒 2 3.35 0.10 10.14 0.00 0.00 0.00 0.00 0.00 0.00 86.41
11时01分29秒 3 2.91 11.26 2.21 0.00 0.00 0.00 0.00 0.00 0.00 83.62
usr represents a user space process, and nice represents a user space process with a nice value greater than 0.
sys is the kernel space, iowait is I/O waiting time, irq is hard interrupt, soft is soft interrupt, idle is idle time, guest and gnice are both virtual machine time.
5. uptime
Uptime is a simple way to get the total running time of the system and the average load of the last 1 minute, 5 minutes, and 15 minutes.
Uptime obtains relevant information through /proc/uptime and /proc/loadavg.
Before up is the current system time, after up is the system running time.
After the load average, the load average is 1 minute, 5 minutes, and 15 minutes.
11:15:41 up 82 days, 20:34, 8 users, load average: 0.28, 0.40, 0.43
6. vmstat
vmstat is mainly a tool for monitoring system memory usage, but it also contains some CPU-related information.
How to use vmstat 5 5 means to run 5 times, 5 seconds each time. The results are as follows:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 472576 228688 559092 1061756 0 0 9 39 1 0 8 4 87 0 0
1 0 472576 228184 559100 1061756 0 0 0 13 1532 3395 10 6 84 0 0
1 0 472576 229308 559100 1061616 0 0 0 0 1446 3449 10 5 85 0 0
0 0 472576 229592 559108 1061616 0 0 0 6 1419 3474 10 5 85 0 0
1 0 472576 229804 559108 1061616 0 0 0 0 1446 3439 10 5 85 0 0
The above parameters can be divided into 6 parts: process, memory, swap, io, interrupt and process switch, cpu.
More detailed explanation:
Reference document: " Linux Performance Measurements using vmstat "
7. pidstat
Pidstat is mainly used to monitor the occupancy of system resources by all or specified processes.
7.1 Check CPU usage
When pidstat runs for the first time, it displays various statistical information since the system is started. After running pidstat, it will display the statistical information since the last time the command was run. Users can obtain the required statistical information by specifying the number and time of statistics.
pidstat -p ALL---------------------------Display all process statistics, including idle processes.
pidstat -p ALL -t------------------------ displays thread statistics in more detail.
pidstat [option] interval [count]-----Period sampling and sampling times
In addition, you can also use -p to obtain statistics about the specified process.
Pidstat can also get memory usage statistics through -r and IO usage statistics through -d.
7.2 View memory usage
The results of pidstat -p ALL -r are as follows:
15时18分21秒 UID PID minflt/s majflt/s VSZ RSS %MEM Command
15时18分21秒 0 1 0.02 0.00 185316 3028 0.08 systemd
15时18分21秒 0 2 0.00 0.00 0 0 0.00 kthreadd
15时18分21秒 0 4 0.00 0.00 0 0 0.00 kworker/0:0H
15时18分21秒 0 6 0.00 0.00 0 0 0.00 mm_percpu_wq
15时18分21秒 0 7 0.00 0.00 0 0 0.00 ksoftirqd/0
15时18分21秒 0 8 0.00 0.00 0 0 0.00 rcu_sched
minflt/s: The number of minor page faults per second (minor page faults). The number of minor page faults means the number of page faults generated when the virtual memory address is mapped to the physical memory address.
majflt/s: The number of major page faults per second (major page faults). When a virtual memory address is mapped to a physical memory address, the corresponding page is in swap. Such a page fault is a major page fault. Generally, when memory usage is tight produce.
VSZ: The virtual memory used by the process (in kB).
RSS: The physical memory used by the process (in kB).
%MEM: The percentage of memory used by the process.
Command: Pull up the command corresponding to the process.
7.3 View disk usage
The results of pidstat -p ALL -d are as follows:
15时20分40秒 UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
15时20分40秒 0 1 -1.00 -1.00 -1.00 243523129 systemd
15时20分40秒 0 2 -1.00 -1.00 -1.00 0 kthreadd
15时20分40秒 0 4 -1.00 -1.00 -1.00 0 kworker/0:0H
15时20分40秒 0 6 -1.00 -1.00 -1.00 0 mm_percpu_wq
15时20分40秒 0 7 -1.00 -1.00 -1.00 714512328679 ksoftirqd/0
15时20分40秒 0 8 -1.00 -1.00 -1.00 417757303594 rcu_sched
kB_rd/s: The amount of data read from the disk by the process per second (in kB).
kB_wr/s: The amount of data written to disk by a process per second (in kB).
kB_ccwr/s: The amount of data (in kB) that the process is canceled to write to the disk per second.
Command: Pull up the command corresponding to the process.
8. time
The time command can be used to count the CPU time of a specified program.
For example, time cksum nomachine_6.0.80_1.exe gets the following results.
2401940638 32606752 nomachine_6.0.80_1.exe
real 0m0.263s-----------------整个操作总耗时,0.263-0.094-0.011=0.158是IO等待耗时。
user 0m0.094s-----------------用户态耗时
sys 0m0.011s------------------内核态耗时
2401940638 32606752 nomachine_6.0.80_1.exe
real 0m0.098s-----------------第二次执行就可以看出等待IO操作的时间基本上没有了。
user 0m0.097s
sys 0m0.000s
9.cpustat
Install through sudo apt install cpustat, cpustat -T -D -x results are as follows.
Load Avg 0.66 0.54 0.49, Freq Avg. 1.46 GHz, 4 CPUs online------------------------------显示Load Avg信息和平均频率等。
3791.1 Ctxt/s, 1709.9 IRQ/s, 1800.0 softIRQ/s, 0.0 new tasks/s, 1 running, 0 blocked----进程切换次数、硬中断、软中断等等统计信息。
%CPU %USR %SYS PID S CPU Time Task-------------------------------------------CPU占用率、用户空间和内核空间占用率等。
25.74 25.74 0.00 11435 R 3 2.29w /usr/bin/python3
15.84 15.84 0.00 9445 S 0 1.49w /usr/lib/xorg/Xorg
10.89 9.90 0.99 2722 S 1 1.05w compiz
7.92 0.00 7.92 32352 S 2 16.60s [kworker/2:1]
0.99 0.00 0.99 32397 R 1 0.01s cpustat
0.99 0.99 0.00 11046 S 2 16.20h compiz
0.99 0.99 0.00 1317 S 0 8.76h /usr/NX/bin/nxnode.bin
0.99 0.00 0.99 10293 S 1 1.24m [kworker/1:2]
64.36 53.47 10.89 Total
Load Avg 0.66 0.54 0.49, Freq Avg. 1.75 GHz, 4 CPUs online
2834.8 Ctxt/s, 1190.9 IRQ/s, 1183.3 softIRQ/s, 0.0 new tasks/s, 4 running, 0 blocked
%CPU %USR %SYS PID S CPU Time Task
25.76 25.76 0.00 11435 R 3 2.29w /usr/bin/python3
18.18 18.18 0.00 9445 S 0 1.49w /usr/lib/xorg/Xorg
7.58 7.58 0.00 2722 S 1 1.05w compiz
6.06 0.00 6.06 32352 S 2 16.64s [kworker/2:1]
1.52 0.00 1.52 32397 R 1 0.02s cpustat
1.52 0.00 1.52 8 S 0 3.00h [rcu_sched]
1.52 0.00 1.52 18409 S 0 1.16m update-notifier
62.12 51.52 10.61 Total
Distribution of CPU utilisation (per Task):
% CPU Utilisation Count (%)
0.00 - 1.97 706 98.88
1.97 - 3.94 0 0.00
3.94 - 5.91 0 0.00
5.91 - 7.88 2 0.28
7.88 - 9.85 0 0.00
9.85 - 11.82 0 0.00
11.82 - 13.79 1 0.14
13.79 - 15.76 0 0.00
15.76 - 17.73 1 0.14
17.73 - 19.70 1 0.14
19.70 - 21.67 0 0.00
21.67 - 23.64 0 0.00
23.64 - 25.61 2 0.28
25.61 - 27.57 0 0.00
27.58 - 29.54 0 0.00
29.55 - 31.51 0 0.00
31.52 - 33.48 0 0.00
33.48 - 35.45 0 0.00
35.45 - 37.42 0 0.00
37.42 - 39.39 1 0.14
Distribution of CPU utilisation (per CPU):----------------------------------------------各CPU占用率,分用户空间和内核空间。
CPU# USR% SYS%
0 17.37 1.20
1 8.98 2.40
2 0.60 7.19
3 25.75 0.00
10. htop
The functions of htop and top are similar, but the readability is better than top. Press F5 on the interface, you can see the threads in the process, and the tree structure represents the parent-child relationship.
11. atop
atop is a tool for monitoring system resources and processes. It sorts the processes in the list in descending order by CPU usage, and each process contains information such as CPU, memory, disk, and network status. Its function is similar to top and htop.
12. glances
glances is a reporting tool written in python with similar functions to Nmon. It can report statistics on cpu, memory, network, disk, and processes. Apart from reporting statistics, glances does not support any other features or functions. When the program is running, click "h" to display the help page.
13. nmon
Nmon is a very easy-to-use tool that can monitor CPU, memory, network, disk usage and process lists on one screen. Except for the inability to manage the process and modify the report display, Nmon is exactly the same as those reporting tools that are only used for reports. In addition, it can save data to a spreadsheet file.
13. pcp-gui
Performance Co-Pilot, PCP for short, is a system performance and analysis framework. It organizes data from multiple hosts and analyzes it in real time to help you identify abnormal performance patterns. It also provides APIs for you to design your own monitoring and reporting solutions.
Install pcp related tools.
sudo apt install pcp pcp-gui
File->Open View select the view that needs to be opened, such as CPU, Disk, Memory, etc.
14. collectl、colplot
14.1 Use of collectl
collectl is a very good utility program with rich command line functions. You can use it to collect performance data describing the current system state.
Unlike most other system monitoring tools, collectl is not limited to limited system metrics. On the contrary, it can collect information about many different types of system resources, such as cpu, disk, memory, network, sockets, tcp, inodes, infiniband , Lustre, memory, nfs, processes, quadrics, slabs, buddyinfo, etc.
At the same time, collectl can also replace common tools, such as top, vmstat, ps, iotop, etc.
Install collectl:
sudo apt-get install collectl
The use of collectl is very simple. By default, collectl displays cpu, disk, and network information.
#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
0 0 162 460 0 0 0 0 0 0 0 0
1 0 308 820 0 0 36 1 0 0 0 0
1 0 572 2022 0 0 36 2 0 0 0 0
0 0 270 728 0 0 0 0 0 0 0 0
Collectl can also display more subsystem information. If the option has a corresponding uppercase option, the uppercase option indicates more detailed device statistics.
b-buddy info (memory fragmentation)
c-unified statistics of all CPUs; C-statistics of a single CPU.
d-Disk integration statistics of the entire file system; C-Statistics of a single disk.
f – NFS V3 Data
i – Inode and File System
j-displays the triggering status of Interrupts for each CPU; J-displays the detailed triggering status of each interrupt.
l - Chandelier
m-Display the memory usage of the entire system; M-display the memory usage by node.
n-displays the network usage of the entire system; N-sub-network card displays the network usage.
s – Sockets
t – TCP
x – Interconnect
y-use statistics for all slabs (system object cache) in the system; Y-use detailed information for each slab.
collectl --all displays statistics of all subsystems, including cpu, terminal, memory, disk, network, TCP, socket, file system, and NFS.
#<----CPU[HYPER]-----><-----------------Int------------------><-----------------Memory-----------------><----------Disks-----------><----------Network----------><-------TCP--------><------Sockets-----><----Files---><------NFS Totals------>
#cpu sys inter ctxsw Cpu0 Cpu1 Cpu2 Cpu3 Cpu4 Cpu5 Cpu6 Cpu7 Free Buff Cach Inac Slab Map Fragments KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut IP Tcp Udp Icmp Tcp Udp Raw Frag Handle Inodes Reads Writes Meta Comm
5 1 749 2738 79 83 67 126 289 57 87 47 4G 107M 1G 640M 151M 1G nlsrkjebaas 0 0 0 0 0 0 0 0 0 2 0 0 1138 0 1 0 11648 71267 0 0 0 0
1 0 276 1323 22 8 12 37 76 19 33 72 4G 107M 1G 640M 151M 1G nlsrkjebaas 0 0 56 13 0 0 0 0 0 0 0 0 1138 0 1 0 11648 71264 0 0 0 0
1 0 298 1336 40 9 26 31 75 31 34 49 4G 107M 1G 640M 151M 1G olsrkjebaas 0 0 24 5 0 0 0 0 0 0 0 0 1138 0 1 0 11648 71256 0 0 0 0
collectl --top can replace the top command:
# TOP PROCESSES sorted by time (counters are /sec) 12:11:40
# PID User PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime RKB WKB MajF MinF Command
14557 al 20 7305 0 R 75M 28M 4 0.02 0.05 7 00:00.47 0 0 0 0 /usr/bin/perl
6985 al 20 1 36 S 1G 181M 3 0.01 0.03 4 01:48.14 0 4 0 1 /opt/google/chrome/chrome
7255 al 20 7000 21 S 955M 215M 1 0.00 0.04 4 01:30.44 0 0 0 1999 /opt/google/chrome/chrome
8006 al 20 7000 17 S 923M 135M 0 0.01 0.03 4 01:24.67 0 0 0 0 /opt/google/chrome/chrome
7294 al 20 2415 3 S 710M 60M 7 0.01 0.01 2 00:12.79 0 0 0 4 /usr/bin/python
collectl --vmstat can replace the vmstat command:
#procs ---------------memory (KB)--------------- --swaps-- -----io---- --system-- ----cpu-----
# r b swpd free buff cache inact active si so bi bo in cs us sy id wa
2 0 0 4634M 108M 1535M 642M 481M 0 0 0 132 594 2523 2 0 96 0
0 0 0 4631M 108M 1539M 642M 481M 0 0 0 0 1006 5308 4 1 93 0
0 0 0 4623M 108M 1547M 642M 481M 0 0 0 48 564 2572 2 0 96 0
collectl -c1 -sZ -i:1 can replace the ps command.
Collectl and some tools for processing and analyzing data (such as colmux, colgui, colplot) can provide visual graphics.
14.2 Use of colplot
Colplot is part of the collectl tool set, which displays the data collected by collectl graphically in the browser.
The introduction of colplot is here , and the relevant source code can be downloaded from collectl-utils .
After decompressing the downloaded colplot, sudo ./INSTALL installs colplot.
Restart the apache service after installation:
suod systemctl reload apache2
sudo systemctl restart apache2
Enter http://127.0.0.1/colplot/ in the browser to use colplot.
Select to store the data saved by collectl -P through Change Dir, and then set the Plot details, display those subsystems, plot size, and so on.
Finally, Generate Plot view the results.
Reference documents: " Collectl: The Almighty Champion of Linux Performance Monitoring ", " Collectl Documentation "," Collectl Examples-An Awesome Performance Analysis Tool in Linux "
0. Other
munin、rrdtool