【Linux】监控系统的状态

1.w命令

w命令是一个很强大的命令，该命令显示的信息比较丰富。以下是我的虚拟机w命令的一个展示

从上图我们可以看到：

第一行从左面开始显示的信息依次为：时间、系统运行时间、登陆用户数、平均负载

第二行以及下面所有行告诉我们，当前登陆的用户以及从哪里登陆的

我们重点关注一下load average

第一个数值表示1分钟内系统的平均负载值；第二个数值表示5分钟系统的平均负载值；第三个数值表示的是系统15分钟内的平均负载值，这个值的意义是单位时间段内CPU进程活动数，这个值越大就说明服务器压力越大

一般情况下这个值只要不超过CPU数量就没关系，那么CPU数量如何查看呢？

2.查看CPU数量

Linux:/ # cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 78
model name	: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
stepping	: 3
microcode	: 186
cpu MHz		: 2400.000
cache size	: 3072 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc up arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt xsave avx f16c rdrand hypervisor lahf_lm 3dnowprefetch ida arat epb pln pts dtherm fsgsbase smep
bogomips	: 4800.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

上述参数释意

processor　：系统中逻辑处理核的编号。对于单核处理器，则课认为是其CPU编号，对于多核处理器则可以是物理核、或者使用超线程技术虚拟的逻辑核
vendor_id　：CPU制造商
cpu family　：CPU产品系列代号
model　　　：CPU属于其系列中的哪一代的代号
model name：CPU属于的名字及其编号、标称主频
stepping   ：CPU属于制作更新版本
cpu MHz   ：CPU的实际使用主频
cache size   ：CPU二级缓存大小
physical id   ：单个CPU的标号（由于我的虚拟机只有一个CPU，所以无以下红色标记信息项）
siblings       ：单个CPU逻辑物理核数
core id        ：当前物理核在其所处CPU中的编号，这个编号不一定连续
cpu cores    ：该逻辑核所处CPU的物理核数
apicid          ：用来区分不同逻辑核的编号，系统中每个逻辑核的此编号必然不同，此编号不一定连续
fpu             ：是否具有浮点运算单元（Floating Point Unit）
fpu_exception ：是否支持浮点计算异常
cpuid level   ：执行cpuid指令前，eax寄存器中的值，根据不同的值cpuid指令会返回不同的内容
wp             ：表明当前CPU是否在内核态支持对用户空间的写保护（Write Protection）
flags          ：当前CPU支持的功能
bogomips   ：在系统内核启动时粗略测算的CPU速度（Million Instructions Per Second）
clflush size ：每次刷新缓存的大小单位
cache_alignment ：缓存地址对齐单位
address sizes     ：可访问地址空间位数

power management ：对能源管理的支持，有以下几个可选支持功能：

　　ts：　　temperature sensor

扫描二维码关注公众号，回复： 1772527 查看本文章

　　fid： frequency id control

　　vid：　 voltage id control

　　ttp：　 thermal trip

　　tm：

　　stc：

　　100mhzsteps：

　　hwpstate：

CPU信息中flags各项含义：

fpu： Onboard (x87) Floating Point Unit
vme： Virtual Mode Extension
de： Debugging Extensions
pse： Page Size Extensions
tsc： Time Stamp Counter: support for RDTSC and WRTSC instructions
msr： Model-Specific Registers
pae： Physical Address Extensions: ability to access 64GB of memory; only 4GB can be accessed at a time though
mce： Machine Check Architecture
cx8： CMPXCHG8 instruction
apic： Onboard Advanced Programmable Interrupt Controller
sep： Sysenter/Sysexit Instructions; SYSENTER is used for jumps to kernel memory during system calls, and SYSEXIT is used for jumps： back to the user code
mtrr： Memory Type Range Registers
pge： Page Global Enable
mca： Machine Check Architecture
cmov： CMOV instruction
pat： Page Attribute Table
pse36： 36-bit Page Size Extensions: allows to map 4 MB pages into the first 64GB RAM, used with PSE.
pn： Processor Serial-Number; only available on Pentium 3
clflush： CLFLUSH instruction
dtes： Debug Trace Store
acpi： ACPI via MSR
mmx： MultiMedia Extension
fxsr： FXSAVE and FXSTOR instructions
sse： Streaming SIMD Extensions. Single instruction multiple data. Lets you do a bunch of the same operation on different pieces of input： in a single clock tick.
sse2： Streaming SIMD Extensions-2. More of the same.
selfsnoop： CPU self snoop
acc： Automatic Clock Control
IA64： IA-64 processor Itanium.
ht： HyperThreading. Introduces an imaginary second processor that doesn’t do much but lets you run threads in the same process a bit quicker.
nx： No Execute bit. Prevents arbitrary code running via buffer overflows.
pni： Prescott New Instructions aka. SSE3
vmx： Intel Vanderpool hardware virtualization technology
svm： AMD “Pacifica” hardware virtualization technology
lm： “Long Mode,” which means the chip supports the AMD64 instruction set
tm： “Thermal Monitor” Thermal throttling with IDLE instructions. Usually hardware controlled in response to CPU temperature.
tm2： “Thermal Monitor 2″ Decrease speed by reducing multipler and vcore.
est： “Enhanced SpeedStep”

根据以上内容，我们则可以很方便的知道当前系统关于CPU、CPU的核数、CPU是否启用超线程等信息。

3.vmstat 监控系统的状态

输出字段释意

Procs

r: The number of processes waiting for run time.

等待运行的进程数。如果等待运行的进程数越多，意味着CPU非常繁忙。另外，如果该参数长期大于和等于逻辑cpu个数，则CPU资源可能存在较大的瓶颈。

b: The number of processes in uninterruptible sleep.

处在非中断睡眠状态的进程数。意味着进程被阻塞。主要是指被资源阻塞的进程对列数（比如IO资源、页面调度等），当这个值较大时，需要根据应用程序来进行分析，比如数据库产品，中间件应用等。

Memory

swpd: the amount of virtual memory used.

已使用的虚拟内存大小。如果虚拟内存使用较多，可能系统的物理内存比较吃紧，需要采取合适的方式来减少物理内存的使用。swapd不为0，并不意味物理内存吃紧，如果swapd没变化，si、so的值长期为0,这也是没有问题的

free: the amount of idle memory.

空闲的物理内存的大小

buff: the amount of memory used as buffers.

用来做buffer（缓存，主要用于块设备缓存）的内存数，单位：KB

cache: the amount of memory used as cache.

用来做cache（缓存，主要用于缓存文件）的内存，单位：KB

inact: the amount of inactive memory. (-a option)

inactive memory的总量

active: the amount of active memory. (-a option)

active memroy的总量。

Swap

si: Amount of memory swapped in from disk (/s).

从磁盘交换到内存的交换页数量，单位：KB/秒。

so: Amount of memory swapped to disk (/s).

从内存交换到磁盘的交换页数量，单位：KB/秒

内存够用的时候，这2个值都是0，如果这2个值长期大于0时，系统性能会受到影响，磁盘IO和CPU资源都会被消耗。

当看到空闲内存（free）很少的或接近于0时，就认为内存不够用了，这个是不正确的。不能光看这一点，还要结合si和so，

如果free很少，但是si和so也很少（大多时候是0），那么不用担心，系统性能这时不会受到影响的。

当内存的需求大于RAM的数量，服务器启动了虚拟内存机制，通过虚拟内存，可以将RAM段移到SWAP DISK的特殊磁盘段上，

这样会出现虚拟内存的页导出和页导入现象，页导出并不能说明RAM瓶颈，虚拟内存系统经常会对内存段进行页导出，

但页导入操作就表明了服务器需要更多的内存了，页导入需要从SWAP DISK上将内存段复制回RAM，导致服务器速度变慢。

bi: Blocks received from a block device (blocks/s).

每秒从块设备接收到的块数，单位：块/秒也就是读块设备。

bo: Blocks sent to a block device (blocks/s).

每秒发送到块设备的块数，单位：块/秒也就是写块设备。

System

in: The number of interrupts per second, including the clock.

每秒的中断数，包括时钟中断

cs: The number of context switches per second.

每秒的环境（上下文）切换次数。比如我们调用系统函数，就要进行上下文切换，而过多的上下文切换会浪费较多的cpu资源，这个数值应该越小越好。

CPU

These are percentages of total CPU time.

us: Time spent running non-kernel code. (user time, including nice time)

用户CPU时间(非内核进程占用时间)（单位为百分比）。 us的值比较高时，说明用户进程消耗的CPU时间多

sy: Time spent running kernel code. (system time)

系统使用的CPU时间（单位为百分比）。sy的值高时，说明系统内核消耗的CPU资源多，这并不是良性表现，我们应该检查原因。

id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.

空闲的CPU的时间(百分比)，在Linux 2.5.41之前，这部分包含IO等待时间。

wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

等待IO的CPU时间，在Linux 2.5.41之前，这个值为0 .这个指标意味着CPU在等待硬盘读写操作的时间，用百分比表示。wait越大则机器io性能就越差。说明IO等待比较严重，这可能由于磁盘大量作随机访问造成，也有可能磁盘出现瓶颈（块操作）。

st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

以上参数中，我们经常关注r列，b列，和wa列，三列代表的意义已经在上边描述清楚。IO部分的bi、bo也需要参考，如果磁盘io压力较大，这两列的数值会比较高。

3.1 vmstat示例

每秒钟打印一次，共打印五次

Linux:/ # vmstat 1 5

或者

每秒钟打印一次，一直打印

Linux:/ # vmstat 1

按ctrl + c 可结束

4.top显示进程所占系统资源

这个命令用于动态的监控进程所占资，每3秒钟刷新一次，这个命令的特点是把所占用的系统资源(CPU、内存、磁盘IO)最高的进程放到前面，在top状态下，按下“shift+m”可以按照内存使用大小排序，按数字1可以列出各颗CPU的使用状态。