1. CPU
cat /proc/cpuinfo
# 物理 CPU 个数
cat /proc/cpuinfo | grep 'physical id' | sort | uniq | wc -l
# 每个 CPU 核心数
cat /proc/cpuinfo | grep 'core id' | sort | uniq | wc -l
# 逻辑 CPU
cat /proc/cpuinfo | grep 'processor' | sort | uniq | wc -l
# mpstat
mpstat
mpstat 2 10
2. Memory
cat /proc/meminfo
free -gt
df -hT
du -csh ./*
OS IPC shared memory/queue:
ipcs #(shmems, queues, semaphores)
Usually we often need to monitor the usage status of the memory, commonly used commands are free
, vmstat
, top
, dstat -m
and so on.
2.1 free
> free -h
total used free shared buffers cached
Mem: 7.7G 6.2G 1.5G 17M 33M 184M
-/+ buffers/cache: 6.0G 1.7G
Swap: 24G 581M 23G
The meaning of each row of data
first line Mem
:
total
: total memory7.7G
, physical memory size , is the actual memory of the machineused
: Used memory6.2G
, this value includescached
the memory actually used by the applicationfree
: free memory1.5G
, unused memory sizeshared
: the size of the shared memory ,17M
buffers
: the memory size occupied by the buffer ,33M
cached
: the memory size occupied by the cache ,184M
Including:
total = used + free
The second line -/+ buffers/cache
represents the memory actually used by the application :
- The previous value indicates
used - buffers/cached
that the memory actually used by the application - The latter value represents
free + buffers/cached
memory that can theoretically be used
It can be seen that the sum of these two values is also
total
The third line swap
represents the usage of the swap partition : total, used and unused
cache cache
cache
Represents cache . When the system reads a file , it will first read the data from the hard disk into the memory . Because the hard disk is much slower than the memory, this process will be time-consuming.
In order to improve efficiency, Linux will cache the read files in memory (locality principle), even if the program ends, the cache will not be released automatically. Therefore, when a program performs a large number of file read operations, you will find that the memory usage rate has increased.
When other programs need to use memory, Linux will release these unused caches according to its own cache strategy (such as LRU) for other programs to use. Of course, you can also release the cache manually:
echo 1 > /proc/sys/vm/drop_caches
buffer buffer
Consider the scenario of writing files from the memory to the hard disk , because the hard disk is too slow, if the memory has to wait for the data to be written before continuing the subsequent operations, the efficiency will be very low, and it will also affect the running speed of the program, so there is a bufferbuffer
.
When the memory needs to write data to the hard disk, it will be put into the buffer first, and the memory will quickly write the data into the buffer, and other work can be continued, while the hard disk can slowly read the data in the buffer in the background and save it, so that Improve the efficiency of reading and writing.
For example, when copying a file from the computer to a USB flash drive, if the file is particularly large, sometimes such a situation may occur: the system will still prompt that the USB flash drive is in use even though the file has been copied. This is the reason for the buffer: although the copy program has put the data in the buffer, it has not written all the data to the U disk.
Similarly, you can use sync
the command to manually flush buffer
edit the content:
> sync --help
Usage: sync [OPTION] [FILE]...
Synchronize cached writes to persistent storage
If one or more files are specified, sync only them,
or their containing file systems.
-d, --data sync only file data, no unneeded metadata
-f, --file-system sync the file systems that contain the files
--help display this help and exit
--version output version information and exit
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Full documentation at: <http://www.gnu.org/software/coreutils/sync>
or available locally via: info '(coreutils) sync invocation'
swap partition
Swap partitionswap
is an important concept in implementing virtual memory. swap
It is to use a part of the space on the hard disk as memory, and the running program will use the physical memory, and put the unused memory on the hard disk, which is called swap out
. Putting the memory in the hard disk swap partition back into the physical memory is called swap in
.
The swap partition can logically expand the memory space, but it will also slow down the system speed, because the read and write speed of the hard disk is very slow. The Linux system puts infrequently used memory in the swap partition.
The difference between cache and buffer
cache
: Aspage cache
the memory, it is the cache of the file system , and the data at the file level will be cachedpage cache
inbuffer
: Asbuffer cache
the memory, it is the cache of disk blocks , and the data directly operated on the disk will be cached in the buffer cache
Simply put: page cache
used to cache file data , buffer cache
used to cache disk data . In the case of a file system, if the file is operated, the data will be cached in page cache
it. If dd
the disk is read and written directly using tools such as , the data will be cached in buffer cache
.
2.2 vmstat
vmstat (Virtual Memory Stats, virtual memory statistics) is a statistics on the overall situation of the system, including statistics of kernel process, virtual memory, disk, interrupt and CPU activity :
> vmstat --help
Usage:
vmstat [options] [delay [count]]
Options:
-a, --active active/inactive memory
-f, --forks number of forks since boot
-m, --slabs slabinfo
-n, --one-header do not redisplay header
-s, --stats event counter statistics
-d, --disk disk statistics
-D, --disk-sum summarize disk statistics
-p, --partition <dev> partition specific statistics
-S, --unit <char> define display unit
-w, --wide wide output
-t, --timestamp show timestamp
-h, --help display this help and exit
-V, --version output version information and exit
来源 | 公众号:网络技术干货圈
For more details see vmstat(8).
> vmstat -SM 1 100 # 1 表示刷新间隔(秒),100 表示打印次数,单位 MB
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 470 188 1154 0 0 0 4 3 0 0 0 99 0 0
0 0 0 470 188 1154 0 0 0 0 112 231 1 1 98 0 0
0 0 0 470 188 1154 0 0 0 0 91 176 0 0 100 0 0
0 0 0 470 188 1154 0 0 0 0 118 229 1 0 99 0 0
0 0 0 470 188 1154 0 0 0 0 78 156 0 0 100 0 0
0 0 0 470 188 1154 0 0 0 64 84 186 0 1 97 2 0
processes
r
Column: Indicates the number of processes running and waiting for the CPU time slice . If this value is greater than the number of CPUs for a long time, it means that the CPU resources are insufficient, and you can consider increasing the CPUb
Column: Indicates the number of processes waiting for a resource , such as waiting for I/O or memory swapping
memory
swpn
Column: Indicates the size of the memory switched to the swap partition . Ifswpd
the value of is not 0 or is relatively large, and the value of and is 0 for a long time, then this situation will not affect the system performance for the timesi
beingso
free
Column: the current free physical memory sizebuff
Column: indicatesbuffers cache
the size of the memory . Generally, the read and write of the block device only needs to be buffered.cache
Column: Indicatespage cache
the memory size , which is generally used as a file system cache , and frequently accessed files will be cached. If the cache value is relatively large, it means that the number of cached files is large. If the I/O isbi
relatively small at this time, it means that the file system is more efficient
swap
si
Column: Indicatesswap in
that the memory is put into physical memory by the swap partitionso
Column: Indicatesswap out
that the unused memory will be put into the swap partition of the hard disk
io
bi
Column: Indicates the total amount of data read from the block device, that is, read disk, unitKB/s
bo
Column: Indicates the total amount of data written to the block device, that is, written to the disk, unitKB/s
bi+bo
The reference value set here1000
, if it exceeds1000
, andwa
the value is relatively large, it means the system disk I/O performance bottleneck
system
in
Column: Indicates the number of device interrupts per second observed in a certain time intervalcs
Column: Indicates the number of context switches generated per second
The larger the above two values, the more CPU time the kernel consumes
cpu
us
Column: Indicates the percentage of time the user process consumed CPU.us
When the value is relatively high, it means that the user process consumes more CPU time. If it is greater than 50% for a long time, you can consider optimizing the programsy
Column: Indicates the percentage of time the kernel process consumed CPU.sy
When the value is relatively high, it means that the kernel consumes more CPU time. If itus+sy
exceeds 80%, it means that the CPU resources are insufficient.id
Column: Indicates the percentage of time the CPU was idlewa
Column: Indicates the percentage of CPU time spent by I/O Wait.wa
The higher the value, the more serious the I/O Wait. Ifwa
the value exceeds 20%, it means that the I/O Wait is seriousst
Column: Indicates CPU Steal Time, for virtual machines
3. Network
3.1 Interface
ifconfig
iftop
ethtool
3.2 Ports
# 端口
netstat -ntlp # TCP
netstat -nulp # UDP
netstat -nxlp # UNIX
netstat -nalp # 不仅展示监听端口,还展示其他阶段的连接
lsof -p <PID> -P
lsof -i :5900
sar -n DEV 1 # 网络流量
ss
ss -s
3.3 tcpdump
sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -XNnvvv
sudo tcpdump -i any -XNnvvv
sudo tcpdump -i any udp -XNnvvv
sudo tcpdump -i any udp port 20112 -XNnvvv
sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -XNnvvv
3.4 nethogs
Monitor the network traffic of each process
nethogs
4. I/O performance
iotop
iostat
iostat -kx 2
vmstat -SM
vmstat 2 10
dstat
dstat --top-io --top-bio
5. Process
top
top -H
htop
ps auxf
ps -eLf # 展示线程
ls /proc/<PID>/task
5.1 top
For example the most commonly used top
commands:
Help for Interactive Commands - procps version 3.2.8
Window 1:Def: Cumulative mode Off. System: Delay 3.0 secs; Secure mode Off.
Z,B Global: 'Z' change color mappings; 'B' disable/enable bold
l,t,m Toggle Summaries: 'l' load avg; 't' task/cpu stats; 'm' mem info
1,I Toggle SMP view: '1' single/separate states; 'I' Irix/Solaris mode
f,o . Fields/Columns: 'f' add or remove; 'o' change display order
F or O . Select sort field
<,> . Move sort field: '<' next col left; '>' next col right
R,H . Toggle: 'R' normal/reverse sort; 'H' show threads
c,i,S . Toggle: 'c' cmd name/line; 'i' idle tasks; 'S' cumulative time
x,y . Toggle highlights: 'x' sort field; 'y' running tasks
z,b . Toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y')
u . Show specific user only
n or # . Set maximum tasks displayed
k,r Manipulate tasks: 'k' kill; 'r' renice
d or s Set update interval
W Write configuration file
q Quit
( commands shown with '.' require a visible task display window )
Press 'h' or '?' for help with Windows,
any other key to continue
1
: Display the usage of each CPUc
: show the full path of the processH
: show threadsP
: Sort - CPU UsageM
: sort - memory usageR
: reverse orderZ
: Change color mappingsB
: Disable/enable boldl
: Toggle load avgt
: Toggle task/cpu statsm
: Toggle mem info
us - Time spent in user space
sy - Time spent in kernel space
ni - Time spent running niced user processes (User defined priority)
id - Time spent in idle operations
wa - Time spent on waiting on IO peripherals (eg. disk)
hi - Time spent handling hardware interrupt routines. (Whenever a peripheral unit want attention form the CPU, it literally pulls a line, to signal the CPU to service it)
来源 | 公众号:网络技术干货圈
si - Time spent handling software interrupt routines. (a piece of code, calls an interrupt routine...)
st - Time spent on involuntary waits by virtual cpu while hypervisor is servicing another processor (stolen from a virtual machine)
5.2 lsof
lsof -P -p 123
6. Performance testing
stress --cpu 8 \
--io 4 \
--vm 2 \
--vm-bytes 128M \
--timeout 60s
time
Order
7. Users
w
whoami
8. System Status
uptime
htop
vmstat
mpstat
dstat
9. Hardware equipment
lspci
lscpu
lsblk
lsblk -fm # 显示文件系统、权限
lshw -c display
dmidecode
10. File system
# 挂载
mount
umount
cat /etc/fstab
# LVM
pvdisplay
pvs
lvdisplay
lvs
vgdisplay
vgs
df -hT
lsof
11. Kernel, interrupt
cat /proc/modules
sysctl -a | grep ...
cat /proc/interrupts
12. System log, kernel log
dmesg
less /var/log/messages
less /var/log/secure
less /var/log/auth
13. cron timed task
crontab -l
crontab -l -u nobody
# 查看所有用户的cron
sudo find /var/spool/cron/ | sudo xargs cat
14. Debugging tools
14.1 perf
14.2 strace
strace
Command to print system calls, signals :
strace -p
strace -p 5191 -f
strace -e trace=signal -p 5191
-e trace=open
-e trace=file
-e trace=process
-e trace=network
-e trace=signal
-e trace=ipc
-e trace=desc
-e trace=memory
14.3 ltrace
ltrace
The command is used to print the dynamic link library access:
ltrace -p <PID>
ltrace -S # syscall
15. Scenarios
Scenario 1: After connecting to the server
w # 显示当前登录的用户、登录 IP、正在执行的进程等
last # 看看最近谁登录了服务器、服务器重启时间
uptime # 开机时间、登录用户、平均负载
history # 查看历史命令
Scenario 2: What information is in the /proc directory
cat /proc/...
cgroups
cmdline
cpuinfo
crypto
devices
diskstats
filesystems
iomem
ioports
kallsyms
meminfo
modules
partitions
uptime
version
vmstat
Scenario 3: Executing commands in the background
nohup <command> &>[some.log] &
some commands
# 综合
top
htop
glances
dstat & sar
mpstat
# 性能分析
perf
# 进程
ps
pstree -p
pgrep
pkill
pidof
Ctrl+z & jobs & fg
# 网络
ip
ifconfig
dig
ping
traceroute
iftop
pingtop
nload
netstat
vnstat
slurm
scp
tcpdump
# 磁盘 I/O
iotop
iostat
# 虚拟机
virt-top
# 用户
w
whoami
# 运行时间
uptime
# 磁盘
du
df
lsblk
# 权限
chown
chmod
# 服务
systemctl list-unit-files
# 定位
find
locate
# 性能测试
time