How to monitor so many parameters of Linux server, mastering these Linux monitoring commands can leave work early!

1. CPU

cat /proc/cpuinfo
# 物理 CPU 个数
cat /proc/cpuinfo | grep 'physical id' | sort | uniq | wc -l
# 每个 CPU 核心数
cat /proc/cpuinfo | grep 'core id' | sort | uniq | wc -l
# 逻辑 CPU
cat /proc/cpuinfo | grep 'processor' | sort | uniq | wc -l
# mpstat
mpstat
mpstat 2 10

2. Memory

cat /proc/meminfo
free -gt
df -hT
du -csh ./*

OS IPC shared memory/queue:

ipcs #(shmems, queues, semaphores)

Usually we often need to monitor the usage status of the memory, commonly used commands are free, vmstat, top, dstat -mand so on.

2.1 free

> free -h
             total       used       free     shared    buffers     cached
Mem:          7.7G       6.2G       1.5G        17M        33M       184M
-/+ buffers/cache:       6.0G       1.7G
Swap:          24G       581M        23G
The meaning of each row of data

first line Mem:

  • total: total memory 7.7G, physical memory size , is the actual memory of the machine
  • used: Used memory6.2G , this value includes cachedthe memory actually used by the application
  • free: free memory1.5G , unused memory size
  • shared: the size of the shared memory ,17M
  • buffers: the memory size occupied by the buffer ,33M
  • cached: the memory size occupied by the cache ,184M

Including:

total = used + free

The second line -/+ buffers/cacherepresents the memory actually used by the application :

  • The previous value indicates used - buffers/cachedthat the memory actually used by the application
  • The latter value represents free + buffers/cachedmemory that can theoretically be used

It can be seen that the sum of these two values ​​is alsototal

The third line swaprepresents the usage of the swap partition : total, used and unused

cache cache

cacheRepresents cache . When the system reads a file , it will first read the data from the hard disk into the memory . Because the hard disk is much slower than the memory, this process will be time-consuming.

In order to improve efficiency, Linux will cache the read files in memory (locality principle), even if the program ends, the cache will not be released automatically. Therefore, when a program performs a large number of file read operations, you will find that the memory usage rate has increased.

When other programs need to use memory, Linux will release these unused caches according to its own cache strategy (such as LRU) for other programs to use. Of course, you can also release the cache manually:

echo 1 > /proc/sys/vm/drop_caches
buffer buffer

Consider the scenario of writing files from the memory to the hard disk , because the hard disk is too slow, if the memory has to wait for the data to be written before continuing the subsequent operations, the efficiency will be very low, and it will also affect the running speed of the program, so there is a bufferbuffer .

When the memory needs to write data to the hard disk, it will be put into the buffer first, and the memory will quickly write the data into the buffer, and other work can be continued, while the hard disk can slowly read the data in the buffer in the background and save it, so that Improve the efficiency of reading and writing.

For example, when copying a file from the computer to a USB flash drive, if the file is particularly large, sometimes such a situation may occur: the system will still prompt that the USB flash drive is in use even though the file has been copied. This is the reason for the buffer: although the copy program has put the data in the buffer, it has not written all the data to the U disk.

Similarly, you can use syncthe command to manually flush bufferedit the content:

> sync --help

Usage: sync [OPTION] [FILE]...
Synchronize cached writes to persistent storage

If one or more files are specified, sync only them,
or their containing file systems.

  -d, --data             sync only file data, no unneeded metadata
  -f, --file-system      sync the file systems that contain the files
      --help     display this help and exit
      --version  output version information and exit

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Full documentation at: <http://www.gnu.org/software/coreutils/sync>
or available locally via: info '(coreutils) sync invocation'
swap partition

Swap partitionswap is an important concept in implementing virtual memory. swapIt is to use a part of the space on the hard disk as memory, and the running program will use the physical memory, and put the unused memory on the hard disk, which is called swap out. Putting the memory in the hard disk swap partition back into the physical memory is called swap in.

The swap partition can logically expand the memory space, but it will also slow down the system speed, because the read and write speed of the hard disk is very slow. The Linux system puts infrequently used memory in the swap partition.

The difference between cache and buffer
  • cache: As page cachethe memory, it is the cache of the file system , and the data at the file level will be cached page cachein
  • buffer: As buffer cachethe memory, it is the cache of disk blocks , and the data directly operated on the disk will be cached in the buffer cache

Simply put: page cacheused to cache file data , buffer cacheused to cache disk data . In the case of a file system, if the file is operated, the data will be cached in page cacheit. If ddthe disk is read and written directly using tools such as , the data will be cached in buffer cache.

2.2 vmstat

vmstat (Virtual Memory Stats, virtual memory statistics) is a statistics on the overall situation of the system, including statistics of kernel process, virtual memory, disk, interrupt and CPU activity :

> vmstat --help

Usage:
 vmstat [options] [delay [count]]

Options:
 -a, --active           active/inactive memory
 -f, --forks            number of forks since boot
 -m, --slabs            slabinfo
 -n, --one-header       do not redisplay header
 -s, --stats            event counter statistics
 -d, --disk             disk statistics
 -D, --disk-sum         summarize disk statistics
 -p, --partition <dev>  partition specific statistics
 -S, --unit <char>      define display unit
 -w, --wide             wide output
 -t, --timestamp        show timestamp

 -h, --help     display this help and exit
 -V, --version  output version information and exit
来源 | 公众号:网络技术干货圈
For more details see vmstat(8).

> vmstat -SM 1 100 # 1 表示刷新间隔(秒),100 表示打印次数,单位 MB

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0    470    188   1154    0    0     0     4    3    0  0  0 99  0  0
 0  0      0    470    188   1154    0    0     0     0  112  231  1  1 98  0  0
 0  0      0    470    188   1154    0    0     0     0   91  176  0  0 100  0  0
 0  0      0    470    188   1154    0    0     0     0  118  229  1  0 99  0  0
 0  0      0    470    188   1154    0    0     0     0   78  156  0  0 100  0  0
 0  0      0    470    188   1154    0    0     0    64   84  186  0  1 97  2  0
processes
  • rColumn: Indicates the number of processes running and waiting for the CPU time slice . If this value is greater than the number of CPUs for a long time, it means that the CPU resources are insufficient, and you can consider increasing the CPU
  • bColumn: Indicates the number of processes waiting for a resource , such as waiting for I/O or memory swapping
memory
  • swpnColumn: Indicates the size of the memory switched to the swap partition . If swpdthe value of is not 0 or is relatively large, and the value of and is 0 for a long time, then this situation will not affect the system performance for the time sibeingso
  • freeColumn: the current free physical memory size
  • buffColumn: indicates buffers cachethe size of the memory . Generally, the read and write of the block device only needs to be buffered.
  • cacheColumn: Indicates page cachethe memory size , which is generally used as a file system cache , and frequently accessed files will be cached. If the cache value is relatively large, it means that the number of cached files is large. If the I/O is birelatively small at this time, it means that the file system is more efficient
swap
  • siColumn: Indicates swap inthat the memory is put into physical memory by the swap partition
  • soColumn: Indicates swap outthat the unused memory will be put into the swap partition of the hard disk
io
  • biColumn: Indicates the total amount of data read from the block device, that is, read disk, unitKB/s
  • boColumn: Indicates the total amount of data written to the block device, that is, written to the disk, unitKB/s

bi+boThe reference value set here 1000, if it exceeds 1000, and wathe value is relatively large, it means the system disk I/O performance bottleneck

system
  • inColumn: Indicates the number of device interrupts per second observed in a certain time interval
  • csColumn: Indicates the number of context switches generated per second

The larger the above two values, the more CPU time the kernel consumes

cpu
  • usColumn: Indicates the percentage of time the user process consumed CPU. usWhen the value is relatively high, it means that the user process consumes more CPU time. If it is greater than 50% for a long time, you can consider optimizing the program
  • syColumn: Indicates the percentage of time the kernel process consumed CPU. syWhen the value is relatively high, it means that the kernel consumes more CPU time. If it us+syexceeds 80%, it means that the CPU resources are insufficient.
  • idColumn: Indicates the percentage of time the CPU was idle
  • waColumn: Indicates the percentage of CPU time spent by I/O Wait. waThe higher the value, the more serious the I/O Wait. If wathe value exceeds 20%, it means that the I/O Wait is serious
  • stColumn: Indicates CPU Steal Time, for virtual machines

3. Network

3.1 Interface

ifconfig
iftop
ethtool

3.2 Ports

# 端口
netstat -ntlp # TCP
netstat -nulp # UDP
netstat -nxlp # UNIX
netstat -nalp # 不仅展示监听端口,还展示其他阶段的连接
lsof -p <PID> -P
lsof -i :5900
sar -n DEV 1  # 网络流量
ss
ss -s

3.3 tcpdump

sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -XNnvvv
sudo tcpdump -i any -XNnvvv
sudo tcpdump -i any udp -XNnvvv
sudo tcpdump -i any udp port 20112 -XNnvvv
sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -XNnvvv

3.4 nethogs

Monitor the network traffic of each process

nethogs

4. I/O performance

iotop
iostat
iostat -kx 2
vmstat -SM
vmstat 2 10
dstat
dstat --top-io --top-bio

5. Process

top
top -H
htop
ps auxf
ps -eLf # 展示线程
ls /proc/<PID>/task

5.1 top

For example the most commonly used topcommands:

Help for Interactive Commands - procps version 3.2.8
Window 1:Def: Cumulative mode Off.  System: Delay 3.0 secs; Secure mode Off.

  Z,B       Global: 'Z' change color mappings; 'B' disable/enable bold
  l,t,m     Toggle Summaries: 'l' load avg; 't' task/cpu stats; 'm' mem info
  1,I       Toggle SMP view: '1' single/separate states; 'I' Irix/Solaris mode

  f,o     . Fields/Columns: 'f' add or remove; 'o' change display order
  F or O  . Select sort field
  <,>     . Move sort field: '<' next col left; '>' next col right
  R,H     . Toggle: 'R' normal/reverse sort; 'H' show threads
  c,i,S   . Toggle: 'c' cmd name/line; 'i' idle tasks; 'S' cumulative time
  x,y     . Toggle highlights: 'x' sort field; 'y' running tasks
  z,b     . Toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y')
  u       . Show specific user only
  n or #  . Set maximum tasks displayed

  k,r       Manipulate tasks: 'k' kill; 'r' renice
  d or s    Set update interval
  W         Write configuration file
  q         Quit
          ( commands shown with '.' require a visible task display window ) 
Press 'h' or '?' for help with Windows,
any other key to continue
  • 1: Display the usage of each CPU
  • c: show the full path of the process
  • H: show threads
  • P: Sort - CPU Usage
  • M: sort - memory usage
  • R: reverse order
  • Z: Change color mappings
  • B: Disable/enable bold
  • l: Toggle load avg
  • t: Toggle task/cpu stats
  • m: Toggle mem info
us - Time spent in user space
sy - Time spent in kernel space
ni - Time spent running niced user processes (User defined priority)
id - Time spent in idle operations
wa - Time spent on waiting on IO peripherals (eg. disk)
hi - Time spent handling hardware interrupt routines. (Whenever a peripheral unit want attention form the CPU, it literally pulls a line, to signal the CPU to service it)
来源 | 公众号:网络技术干货圈
si - Time spent handling software interrupt routines. (a piece of code, calls an interrupt routine...)
st - Time spent on involuntary waits by virtual cpu while hypervisor is servicing another processor (stolen from a virtual machine)

5.2 lsof

lsof -P -p 123

6. Performance testing

stress --cpu 8 \
       --io 4  \
       --vm 2  \
       --vm-bytes 128M \
       --timeout 60s

timeOrder

7. Users

w
whoami

8. System Status

uptime
htop
vmstat
mpstat
dstat

9. Hardware equipment

lspci
lscpu
lsblk
lsblk -fm # 显示文件系统、权限
lshw -c display
dmidecode

10. File system

# 挂载
mount
umount
cat /etc/fstab
# LVM
pvdisplay
pvs
lvdisplay
lvs
vgdisplay
vgs
df -hT
lsof

11. Kernel, interrupt

cat /proc/modules
sysctl -a | grep ...
cat /proc/interrupts

12. System log, kernel log

dmesg
less /var/log/messages
less /var/log/secure
less /var/log/auth

13. cron timed task

crontab -l
crontab -l -u nobody
 # 查看所有用户的cron
sudo find /var/spool/cron/ | sudo xargs cat

14. Debugging tools

14.1 perf

14.2 strace

straceCommand to print system calls, signals :

strace -p
strace -p 5191 -f
strace -e trace=signal -p 5191

-e trace=open
-e trace=file
-e trace=process
-e trace=network
-e trace=signal
-e trace=ipc
-e trace=desc
-e trace=memory

14.3 ltrace

ltraceThe command is used to print the dynamic link library access:

ltrace -p <PID>
ltrace -S # syscall

15. Scenarios

Scenario 1: After connecting to the server

w       # 显示当前登录的用户、登录 IP、正在执行的进程等
last    # 看看最近谁登录了服务器、服务器重启时间
uptime  # 开机时间、登录用户、平均负载
history # 查看历史命令

Scenario 2: What information is in the /proc directory

cat /proc/...

cgroups
cmdline
cpuinfo
crypto
devices
diskstats
filesystems
iomem
ioports
kallsyms
meminfo
modules
partitions
uptime
version
vmstat

Scenario 3: Executing commands in the background

nohup <command> &>[some.log] &

some commands

# 综合
top
htop 
glances
dstat & sar
mpstat
# 性能分析
perf
# 进程
ps
pstree -p
pgrep
pkill
pidof
Ctrl+z & jobs & fg
# 网络
ip
ifconfig
dig
ping
traceroute
iftop 
pingtop 
nload
netstat
vnstat
slurm
scp
tcpdump
# 磁盘 I/O
iotop 
iostat
# 虚拟机
virt-top
# 用户
w
whoami
# 运行时间
uptime
# 磁盘
du
df
lsblk
# 权限
chown
chmod
# 服务
systemctl list-unit-files
# 定位
find
locate
# 性能测试
time

Guess you like

Origin blog.csdn.net/weixin_43025343/article/details/132269629