Introduction to linux cpu occupancy monitoring tool

table of Contents

1. top

1.1 /proc/loadavg

1.2 Use of top

2. perf

3. sar 、 ksar

4. mpstat

5. uptime

6. vmstat

7. pidstat

7.1 Check CPU usage

7.2 View memory usage

7.3 View disk usage

8. time

9.cpustat

10.  htop

11. atop

12. glances

13. nmon

 13. pcp-gui

14. collectl、colplot

14.1 Use of collectl

14.2 Use of colplot

0. Other


1. top

top is the most commonly used tool to view system resource usage, including CPU, memory and other resources.

The main focus here is on CPU resources.

1.1 /proc/loadavg

Load average is taken from /proc/loadavg.

9.53 9.12 8.37 3/889 28165

The first three numbers are the average number of processes in the process queue within 1, 5, and 15 minutes, including running processes + processes ready to be run.

The numerator of the fourth number represents the number of running processes, and the denominator is the total number of processes.

The last number is the ID number of the most recently run process.

Where top takes the first three numbers of /proc/loadavg.

1.2 Use of top

Open top, you can specify the update cycle.

Enter H to open the hidden thread; enter 1 to display the single-core CPU usage.

top -H -b -d 1 -n 200> top.txt, counts once every 1 second, a total of 200 times, displays thread details, and saves it to top.txt.

You also depend on the top sampling sources, /proc/stat and /proc/<pid>/stat. For detailed introduction of these two, please refer to: /proc/stat and /proc/<pid>/stat .

The meaning of the CPU information is as follows:

Us is the meaning of user. It counts user space processes with nice less than or equal to 0, that is, the priority is 100~120.

ni means nice, counting user space processes with nice greater than 0, that is, the priority is 121~139.

sys means system, which counts the running time of the kernel state, excluding interrupts.

id means idle, and several systems are in an idle state.

Wa means iowait, which counts io waiting time.

hi is hardware interrupt, which counts the hardware interrupt time.

si is software interrupt, which counts the time of software interruption.

The last st means steal.

 

 

 

2. perf

" The introduction and use of system-level performance analysis tool perf " has a detailed introduction about the use of perf, here we focus on CPU usage.

Through sudo perf top -s comm, you can view the proportion of current system running processes.

Unlike top, which distinguishes idle, system, and user, the proportion here is the proportion of each process in the total running time.

Record sampling information through sudo perf record, and then through sudo perf report -s comm.

 

 

3. sar 、 ksar

Sar is the meaning of System Activity Report, which can be used to observe the current system activity in real time, and it can also generate reports of historical records.

To use sar, you need to install sudo apt install sysstat, and then configure sysstat.

sar is used to record statistical information, and ksar is used to graphically output the recorded information.

The ksar download address is: https://github.com/vlsi/ksar/releases .

sudo gedit /etc/default/sysstat--------------------------------将 ENABLED=“false“ 改为ENABLED=“true“。

sudo gedit /etc/cron.d/sysstat--------------------------------Modify the sar cycle and other configurations.

sudo /etc/init.d/sysstat restart--------------------------------Restart the sar service

/var/log/sysstat/------------------------------------------- -------sar log storage directory

Use sar to record the current statistics from booting to the file sar.txt.

LC_ALL=C sar -A > sar.txt 

PS: sar-A is used directly here, which cannot be displayed normally in ksar.

Execute java -jar ksar.jar as follows, then Data->Load from text file... select the saved sar.txt file.

Get the following chart. 

You can also record information for a period of time through sar, and specify the sampling period and sampling times.

These commands are preceded by LC_ALL=C and saved to a file, and they can all be displayed graphically in ksar.

sar 1 100---------------------------------------statistics of all cpu in one

sar -P ALL 1 100-----------------------------Including cpu integration and single cpu statistics

sar -B 1 100-----------------------------------paging statistics

sar -b 1 100----------------------------------Block device IO statistics

sar -d 1 100----------------------------------Block device activity statistics

sar -F 1 100--------------------------------- mounted file system statistics

sar -r ALL------------------------------------Display detailed memory usage statistics

sar -S----------------------------------------display swap space usage statistics

sar -w---------------------------------------Display process creation and process switching statistics

sar -W--------------------------------------Display the statistics of swap in and out.

 

更详细请参考《How To Create sar Graphs With kSar To Identifying Linux Bottlenecks》、《Collect and report Linux System Activity Information with sar》。

 

4. mpstat

mpstat is Multiprocessor Statistics. When there is no parameter, mpstat displays the average value of all information since the system.

Common usage is as follows, -P ALL monitors all CPUs, and details show specific CPUs; 10 means monitoring every 10 seconds; 20 means monitoring 20 times.

mpstat -P ALL 10 20

The results are as follows:

Linux 4.13.0-36-generic (xxx)     2018年08月13日     _x86_64_    (4 CPU)

11时01分09秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11时01分19秒  all    3.44    6.26    5.15    0.13    0.00    0.20    0.00    0.00    0.00   84.82
11时01分19秒    0    3.09   13.46    3.29    0.00    0.00    0.10    0.00    0.00    0.00   80.06
11时01分19秒    1    4.41    3.11    5.02    0.00    0.00    0.60    0.00    0.00    0.00   86.86
11时01分19秒    2    2.96    0.20    9.29    0.00    0.00    0.10    0.00    0.00    0.00   87.45
11时01分19秒    3    3.32    7.95    3.12    0.50    0.00    0.00    0.00    0.00    0.00   85.11

11时01分19秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11时01分29秒  all    3.65    6.09    5.08    0.00    0.00    0.25    0.00    0.00    0.00   84.93
11时01分29秒    0    3.92   11.07    4.63    0.00    0.00    0.20    0.00    0.00    0.00   80.18
11时01分29秒    1    4.39    1.90    3.49    0.00    0.00    0.80    0.00    0.00    0.00   89.42
11时01分29秒    2    3.35    0.10   10.14    0.00    0.00    0.00    0.00    0.00    0.00   86.41
11时01分29秒    3    2.91   11.26    2.21    0.00    0.00    0.00    0.00    0.00    0.00   83.62

usr represents a user space process, and nice represents a user space process with a nice value greater than 0.

sys is the kernel space, iowait is I/O waiting time, irq is hard interrupt, soft is soft interrupt, idle is idle time, guest and gnice are both virtual machine time.

 

5. uptime

Uptime is a simple way to get the total running time of the system and the average load of the last 1 minute, 5 minutes, and 15 minutes.

Uptime obtains relevant information through /proc/uptime and /proc/loadavg.

Before up is the current system time, after up is the system running time.

After the load average, the load average is 1 minute, 5 minutes, and 15 minutes.

11:15:41 up 82 days, 20:34,  8 users,  load average: 0.28, 0.40, 0.43

 

6. vmstat

vmstat is mainly a tool for monitoring system memory usage, but it also contains some CPU-related information.

How to use vmstat 5 5 means to run 5 times, 5 seconds each time. The results are as follows:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 472576 228688 559092 1061756    0    0     9    39    1    0  8  4 87  0  0
 1  0 472576 228184 559100 1061756    0    0     0    13 1532 3395 10  6 84  0  0
 1  0 472576 229308 559100 1061616    0    0     0     0 1446 3449 10  5 85  0  0
 0  0 472576 229592 559108 1061616    0    0     0     6 1419 3474 10  5 85  0  0
 1  0 472576 229804 559108 1061616    0    0     0     0 1446 3439 10  5 85  0  0

The above parameters can be divided into 6 parts: process, memory, swap, io, interrupt and process switch, cpu.

More detailed explanation:

 Reference document: " Linux Performance Measurements using vmstat "

 

7. pidstat

Pidstat is mainly used to monitor the occupancy of system resources by all or specified processes.

7.1 Check CPU usage

When pidstat runs for the first time, it displays various statistical information since the system is started. After running pidstat, it will display the statistical information since the last time the command was run. Users can obtain the required statistical information by specifying the number and time of statistics.

pidstat -p ALL---------------------------Display all process statistics, including idle processes.

pidstat -p ALL -t------------------------ displays thread statistics in more detail.

pidstat [option] interval [count]-----Period sampling and sampling times

In addition, you can also use -p to obtain statistics about the specified process.

Pidstat can also get memory usage statistics through -r and IO usage statistics through -d.

 

7.2 View memory usage

The results of pidstat -p ALL -r are as follows:

15时18分21秒   UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
15时18分21秒     0         1      0.02      0.00  185316    3028   0.08  systemd
15时18分21秒     0         2      0.00      0.00       0       0   0.00  kthreadd
15时18分21秒     0         4      0.00      0.00       0       0   0.00  kworker/0:0H
15时18分21秒     0         6      0.00      0.00       0       0   0.00  mm_percpu_wq
15时18分21秒     0         7      0.00      0.00       0       0   0.00  ksoftirqd/0
15时18分21秒     0         8      0.00      0.00       0       0   0.00  rcu_sched

minflt/s: The number of minor page faults per second (minor page faults). The number of minor page faults means the number of page faults generated when the virtual memory address is mapped to the physical memory address.

majflt/s: The number of major page faults per second (major page faults). When a virtual memory address is mapped to a physical memory address, the corresponding page is in swap. Such a page fault is a major page fault. Generally, when memory usage is tight produce.

VSZ: The virtual memory used by the process (in kB).

RSS: The physical memory used by the process (in kB).

%MEM: The percentage of memory used by the process.

Command: Pull up the command corresponding to the process.

7.3 View disk usage

The results of pidstat -p ALL -d are as follows:

15时20分40秒   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
15时20分40秒     0         1     -1.00     -1.00     -1.00 243523129  systemd
15时20分40秒     0         2     -1.00     -1.00     -1.00       0  kthreadd
15时20分40秒     0         4     -1.00     -1.00     -1.00       0  kworker/0:0H
15时20分40秒     0         6     -1.00     -1.00     -1.00       0  mm_percpu_wq
15时20分40秒     0         7     -1.00     -1.00     -1.00 714512328679  ksoftirqd/0
15时20分40秒     0         8     -1.00     -1.00     -1.00 417757303594  rcu_sched

kB_rd/s: The amount of data read from the disk by the process per second (in kB).

kB_wr/s: The amount of data written to disk by a process per second (in kB).

kB_ccwr/s: The amount of data (in kB) that the process is canceled to write to the disk per second.

Command: Pull up the command corresponding to the process.

 

8. time

The time command can be used to count the CPU time of a specified program.

For example, time cksum nomachine_6.0.80_1.exe gets the following results.

2401940638 32606752 nomachine_6.0.80_1.exe

real    0m0.263s-----------------整个操作总耗时,0.263-0.094-0.011=0.158是IO等待耗时。
user    0m0.094s-----------------用户态耗时
sys    0m0.011s------------------内核态耗时
2401940638 32606752 nomachine_6.0.80_1.exe

real    0m0.098s-----------------第二次执行就可以看出等待IO操作的时间基本上没有了。
user    0m0.097s
sys    0m0.000s

 

9.cpustat

Install through sudo apt install cpustat, cpustat -T -D -x results are as follows.

Load Avg 0.66 0.54 0.49, Freq Avg. 1.46 GHz, 4 CPUs online------------------------------显示Load Avg信息和平均频率等。
3791.1 Ctxt/s, 1709.9 IRQ/s, 1800.0 softIRQ/s, 0.0 new tasks/s, 1 running, 0 blocked----进程切换次数、硬中断、软中断等等统计信息。
  %CPU   %USR   %SYS   PID S  CPU   Time Task-------------------------------------------CPU占用率、用户空间和内核空间占用率等。
 25.74  25.74   0.00 11435 R    3  2.29w /usr/bin/python3
 15.84  15.84   0.00  9445 S    0  1.49w /usr/lib/xorg/Xorg
 10.89   9.90   0.99  2722 S    1  1.05w compiz
  7.92   0.00   7.92 32352 S    2 16.60s [kworker/2:1]
  0.99   0.00   0.99 32397 R    1  0.01s cpustat
  0.99   0.99   0.00 11046 S    2 16.20h compiz
  0.99   0.99   0.00  1317 S    0  8.76h /usr/NX/bin/nxnode.bin
  0.99   0.00   0.99 10293 S    1  1.24m [kworker/1:2]
 64.36  53.47  10.89 Total

Load Avg 0.66 0.54 0.49, Freq Avg. 1.75 GHz, 4 CPUs online
2834.8 Ctxt/s, 1190.9 IRQ/s, 1183.3 softIRQ/s, 0.0 new tasks/s, 4 running, 0 blocked
  %CPU   %USR   %SYS   PID S  CPU   Time Task
 25.76  25.76   0.00 11435 R    3  2.29w /usr/bin/python3
 18.18  18.18   0.00  9445 S    0  1.49w /usr/lib/xorg/Xorg
  7.58   7.58   0.00  2722 S    1  1.05w compiz
  6.06   0.00   6.06 32352 S    2 16.64s [kworker/2:1]
  1.52   0.00   1.52 32397 R    1  0.02s cpustat
  1.52   0.00   1.52     8 S    0  3.00h [rcu_sched]
  1.52   0.00   1.52 18409 S    0  1.16m update-notifier
 62.12  51.52  10.61 Total

Distribution of CPU utilisation (per Task):
% CPU Utilisation   Count   (%)
  0.00 -   1.97       706  98.88
  1.97 -   3.94         0   0.00
  3.94 -   5.91         0   0.00
  5.91 -   7.88         2   0.28
  7.88 -   9.85         0   0.00
  9.85 -  11.82         0   0.00
 11.82 -  13.79         1   0.14
 13.79 -  15.76         0   0.00
 15.76 -  17.73         1   0.14
 17.73 -  19.70         1   0.14
 19.70 -  21.67         0   0.00
 21.67 -  23.64         0   0.00
 23.64 -  25.61         2   0.28
 25.61 -  27.57         0   0.00
 27.58 -  29.54         0   0.00
 29.55 -  31.51         0   0.00
 31.52 -  33.48         0   0.00
 33.48 -  35.45         0   0.00
 35.45 -  37.42         0   0.00
 37.42 -  39.39         1   0.14

Distribution of CPU utilisation (per CPU):----------------------------------------------各CPU占用率,分用户空间和内核空间。
 CPU#   USR%   SYS%
    0  17.37   1.20
    1   8.98   2.40
    2   0.60   7.19
    3  25.75   0.00

 

10.  htop

The functions of htop and top are similar, but the readability is better than top. Press F5 on the interface, you can see the threads in the process, and the tree structure represents the parent-child relationship.

 

 

11. atop

atop is a tool for monitoring system resources and processes. It sorts the processes in the list in descending order by CPU usage, and each process contains information such as CPU, memory, disk, and network status. Its function is similar to top and htop.

 

12. glances

glances is a reporting tool written in python with similar functions to Nmon. It can report statistics on cpu, memory, network, disk, and processes. Apart from reporting statistics, glances does not support any other features or functions. When the program is running, click "h" to display the help page.

 

 

13. nmon

Nmon is a very easy-to-use tool that can monitor CPU, memory, network, disk usage and process lists on one screen. Except for the inability to manage the process and modify the report display, Nmon is exactly the same as those reporting tools that are only used for reports. In addition, it can save data to a spreadsheet file.

 

 

 13. pcp-gui

Performance Co-Pilot, PCP for short, is a system performance and analysis framework. It organizes data from multiple hosts and analyzes it in real time to help you identify abnormal performance patterns. It also provides APIs for you to design your own monitoring and reporting solutions.

Install pcp related tools.

sudo apt install pcp pcp-gui

File->Open View select the view that needs to be opened, such as CPU, Disk, Memory, etc.

 

14. collectl、colplot

14.1 Use of collectl

collectl is a very good utility program with rich command line functions. You can use it to collect performance data describing the current system state.

Unlike most other system monitoring tools, collectl is not limited to limited system metrics. On the contrary, it can collect information about many different types of system resources, such as cpu, disk, memory, network, sockets, tcp, inodes, infiniband , Lustre, memory, nfs, processes, quadrics, slabs, buddyinfo, etc.

At the same time, collectl can also replace common tools, such as top, vmstat, ps, iotop, etc.

Install collectl:

sudo apt-get install collectl

The use of collectl is very simple. By default, collectl displays cpu, disk, and network information.

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
   0   0   162    460      0      0      0      0      0      0      0       0 
   1   0   308    820      0      0     36      1      0      0      0       0 
   1   0   572   2022      0      0     36      2      0      0      0       0 
   0   0   270    728      0      0      0      0      0      0      0       0 

 

Collectl can also display more subsystem information. If the option has a corresponding uppercase option, the uppercase option indicates more detailed device statistics.

b-buddy info (memory fragmentation)

c-unified statistics of all CPUs; C-statistics of a single CPU.

d-Disk integration statistics of the entire file system; C-Statistics of a single disk.

f – NFS V3 Data

i – Inode and File System

j-displays the triggering status of Interrupts for each CPU; J-displays the detailed triggering status of each interrupt.

l - Chandelier

m-Display the memory usage of the entire system; M-display the memory usage by node.

n-displays the network usage of the entire system; N-sub-network card displays the network usage.

s – Sockets

t – TCP

x – Interconnect

y-use statistics for all slabs (system object cache) in the system; Y-use detailed information for each slab.

collectl --all displays statistics of all subsystems, including cpu, terminal, memory, disk, network, TCP, socket, file system, and NFS.

#<----CPU[HYPER]-----><-----------------Int------------------><-----------------Memory-----------------><----------Disks-----------><----------Network----------><-------TCP--------><------Sockets-----><----Files---><------NFS Totals------>
#cpu sys inter  ctxsw Cpu0 Cpu1 Cpu2 Cpu3 Cpu4 Cpu5 Cpu6 Cpu7 Free Buff Cach Inac Slab  Map   Fragments KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut   IP  Tcp  Udp Icmp  Tcp  Udp  Raw Frag Handle Inodes  Reads Writes Meta Comm 
   5   1   749   2738   79   83   67  126  289   57   87   47   4G 107M   1G 640M 151M   1G nlsrkjebaas      0      0      0      0      0      0      0       0    0    2    0    0 1138    0    1    0  11648  71267      0      0    0    0 
   1   0   276   1323   22    8   12   37   76   19   33   72   4G 107M   1G 640M 151M   1G nlsrkjebaas      0      0     56     13      0      0      0       0    0    0    0    0 1138    0    1    0  11648  71264      0      0    0    0 
   1   0   298   1336   40    9   26   31   75   31   34   49   4G 107M   1G 640M 151M   1G olsrkjebaas      0      0     24      5      0      0      0       0    0    0    0    0 1138    0    1    0  11648  71256      0      0    0    0 

collectl --top can replace the top command:

# TOP PROCESSES sorted by time (counters are /sec) 12:11:40
# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
14557  al       20  7305    0 R   75M   28M  4  0.02  0.05   7  00:00.47    0    0    0    0 /usr/bin/perl 
 6985  al       20     1   36 S    1G  181M  3  0.01  0.03   4  01:48.14    0    4    0    1 /opt/google/chrome/chrome 
 7255  al       20  7000   21 S  955M  215M  1  0.00  0.04   4  01:30.44    0    0    0 1999 /opt/google/chrome/chrome 
 8006  al       20  7000   17 S  923M  135M  0  0.01  0.03   4  01:24.67    0    0    0    0 /opt/google/chrome/chrome 
 7294  al       20  2415    3 S  710M   60M  7  0.01  0.01   2  00:12.79    0    0    0    4 /usr/bin/python 

collectl --vmstat can replace the vmstat command:

#procs ---------------memory (KB)--------------- --swaps-- -----io---- --system-- ----cpu-----
# r  b   swpd   free   buff  cache  inact active   si   so    bi    bo   in    cs us sy  id wa
  2  0      0  4634M   108M  1535M   642M   481M    0    0     0   132  594  2523  2  0  96  0
  0  0      0  4631M   108M  1539M   642M   481M    0    0     0     0 1006  5308  4  1  93  0
  0  0      0  4623M   108M  1547M   642M   481M    0    0     0    48  564  2572  2  0  96  0

collectl -c1 -sZ -i:1 can replace the ps command.

 

Collectl and some tools for processing and analyzing data (such as colmux, colgui, colplot) can provide visual graphics.

14.2 Use of colplot

Colplot is part of the collectl tool set, which displays the data collected by collectl graphically in the browser.

The introduction of colplot is here , and the relevant source code can be downloaded from collectl-utils .

After decompressing the downloaded colplot, sudo ./INSTALL installs colplot.

Restart the apache service after installation:

suod systemctl reload apache2

sudo systemctl restart apache2

Enter http://127.0.0.1/colplot/ in the browser to use colplot.

Select to store the data saved by collectl -P through Change Dir, and then set the Plot details, display those subsystems, plot size, and so on.

Finally, Generate Plot view the results.

 

Reference documents: " Collectl: The Almighty Champion of Linux Performance Monitoring ", " Collectl Documentation "," Collectl Examples-An Awesome Performance Analysis Tool in Linux "

 

0. Other

munin、rrdtool

 

 

Guess you like

Origin blog.csdn.net/whatday/article/details/114702969