(transfer) vmstat detailed explanation

For CPU usage, you can view it with the vmstat command:

#vmstat 1 5 It is displayed once every one second, and the system status is displayed 10 times in total.

System Configuration: lcpu=16 mem=31488MB

kthr   memory                 page                    faults         cpu

----- ----------- ------------------------ ------------ -----------

rb avm fre re pi po fr sr cy in sy cs us sy id wa

4 2 3127273 3272 0  4  8  338 937 0 2357 22962 1561 13 1 83 3

The meanings of each of them are as follows.

kthr: The kernel thread state changes every second during the sampling interval.

r: The number of threads currently waiting to be executed in the queue. If this value is often very large (for example, 2 to 5 times the number of CPUs), it means that there are many threads waiting to be executed in the system, and there may be performance problems. Note that for large-scale application systems, there are many threads to be executed, so it is normal that there are often one-digit numbers in the waiting queue. It does not mean that the CPU of the system meets the requirements only if the waiting queue is always 0. In fact, 0 waiting queues generally indicate that a "big horse-drawn trolley" has occurred.
b: The thread in the waiting state in the current queue cannot continue to execute because the IO (storage, network) operation has not been completed and must be suspended. At this time, the CPU can switch to another thread for operation. Obviously, if there are many (more than 2~5 times the number of CPUs) threads in the b state, the system may have an IO bottleneck.

Memory : Information about using virtual and real memory. A virtual page can be considered active if it has already been accessed. A page is 4096 bytes.

avm: Active Memory, not available memory! This is a term that is often misunderstood. It is equal to the sum of the currently used physical memory and the used swap area minus the physical memory used as a file system cache. The unit is 4KB, that is, a memory page.
fre: Free physical memory, the unit is 4KB.
pi: The number of physical memory pages that the system read back from disk swap during the interval between vmstat checks. Usually it is the performance of insufficient memory.
po: The number of physical memory pages that the system swapped to disk during the interval between vmstat checks. The more the system memory is, the tighter the system memory is, but the occasional pi, po does not indicate any problem, and usually a large number of system file operations (such as file system backup/restore, tar operation, etc.) are often accompanied by a large number of pi, po. All are normal.

Pages : Information about page faults and paging activity. These are the averages of the intervals, given in seconds.

re: pi/po, the value is close to 1 for a long time, and both pi and po are very large, indicating that the system may have a phenomenon of thrash (jitter), that is, the memory that has just been swapped out needs to be used again and must be swapped back, and the physical memory may be Seriously insufficient.
fr: Indicates how much unused physical memory has been released during this period of time. It may be that the memory page is released by the program, or its memory has been swapped to disk.
sr: Indicates the number of pages checked during multi-memory page search due to memory usage requests and insufficient physical memory during this period. The larger this value is, the more memory is requested, that is, the larger the memory requirement.
cy: The clock cycles consumed by memory search and cleanup. The larger the value, the tighter the memory, and the operating system spends too much time for memory cleanup. Of course, it is also possible that the program scheduling (start/stop) in the system is too frequent. In any case, the size of cy indicates that the current memory usage and management needs to be carefully analyzed.

Failures : Capture and interrupt rates per second averaged over the sampling interval.

in: The number of interrupts, the number of interrupts generated by various reasons during this period of time. The reason for the interrupt may be the expiration of the CPU processing time slice, the interruption of the device IO, etc. vmstat -i can view more detailed information about interrupts
cs: context switch, for a CPU, when the executing thread is not the same as the thread to be executed, a cs (Content Switch) will be generated. There are three situations that will cause cs: the current thread waits for resources (disk/network IO completion), the thread itself requires sleep or waits for the resource to be unlocked; a higher priority thread requires the CPU; this thread exhausts the 10ms time slice. Because naturally, each CPU can generate 100 thread switches (one time in 10ms), so the number of cs divided by the interval time of vmstat, and then divided by the number of CPUs can be used to judge one of the indicators of system busyness. If it is higher than 100 (usually a ratio within 10~20 times, that is, it is normal for each CPU to correspond to 1000~2000 CS times), there may be a CPU bottleneck. For the latest P5 and P6 series CPUs, since they support SMT and a physical CPU supports 2 threads at the same time, you can consider dividing by 2 or 1.5 to make a reasonable judgment.
sy: (sy in the faults area) The number of system calls during this time period. During the execution of the user program, a system execution call application is issued, so that ordinary users can request core operations, such as disk IO and other operations.

Cpu : CPU usage time failure percentage.

us: The percentage of CPU time occupied by user operations in the system.
sy: (sy in the cpu area) the percentage of CPU time occupied by system calls in the system.
id: The idle time percentage of the CPU in the system.
wa: The percentage of time that the system waits for disk IO (CPU is idle at this time).

On machines with CPU sharing (power 5 and 6 models are required, and the micro-partitioning function is required), there are two other items, pc and ec, where pc represents the actual number of CPUs allocated to this partition (may be a decimal number) , in units of 1%), and ec represents the ratio of the number of CPUs authorized for this partition to the actual use (when it exceeds 100, it means that the current partition temporarily preempts and uses more than the CPU resources assigned to it).

 

 ===============================================================

===================================================================

How to identify system bottlenecks when Load average is high. Is it caused by insufficient CPU or not fast enough io? Or out of memory? 

One: View the system load vmstat 

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0      0 496056 889316 4065748    0    0     9    41   55   51  0  0 99  1  0

1 : procs 

procs 

 r  b

 0  0

r : The number of processes running and waiting for the CPU time slice. If it is greater than 1 for a long time, it means that the CPU is insufficient and needs to be increased. 

b : The number of processes waiting for resources, such as waiting for I/O, or memory swapping, etc. 

2 : memory 

-----------memory----------

swpd   free   buff  cache  

 0    496056 889316 4065748

swpd : The amount of memory to switch to the memory swap area (indicated by k). 

       如果swpd的值不为0,或者比较大,比如超过了100m,只要si、so的值长期为0,系统性能还是正常 

free :当前的空闲页面列表中内存数量(k表示) 

buff :作为buffer cache的内存数量,一般对块设备的读写才需要缓冲。 

cache :作为page cache的内存数量,一般作为文件系统的cache, 

        如果cache较大,说明用到cache的文件较多,如果此时IO中bi比较小,说明文件系统效率比较好。 

3 : swap 

---swap--

 si   so

 0    0 

si :由内存进入内存交换区数量。 

so :由内存交换区进入内存数量。 

4 : IO 

-----io----

 bi    bo 

 9    41

bi :从块设备读入数据的总量(读磁盘)(每秒kb)。 

bo :块设备写入数据的总量(写磁盘)(每秒kb) 

这里我们设置的bi+bo参考值为1000,如果超过1000,而且wa值较大应该考虑均衡磁盘负载,可以结合iostat输出来分析。 

5 : system 显示采集间隔内发生的中断数 

--system--

 in   cs

 55   51

in  :在某一时间间隔中观测到的每秒设备中断数。 

cs :每秒产生的上下文切换次数,如当 cs 比磁盘 I/O 和网络信息包速率高得多,都应进行进一步调查。 

6 : cpu 表示cpu的使用状态 

 -----cpu------

cs us sy id wa st

51 0  0  99 1  0

us :用户方式下所花费 CPU 时间的百分比。us的值比较高时,说明用户进程消耗的cpu时间多,但是如果长期大于50%,需要考虑优化用户的程序。 

sy :内核进程所花费的cpu时间的百分比。这里us + sy的参考值为80%,如果us+sy 大于 80%说明可能存在CPU不足。 

wa  :IO等待所占用的CPU时间的百分比。这里wa的参考值为30%,如果wa超过30%,说明IO等待严重, 

      这可能是磁盘大量随机访问造成的,也可能磁盘或者磁盘访问控制器的带宽瓶颈造成的(主要是块操作)。 

id :cpu处在空闲状态的时间百分比 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326557982&siteId=291194637