Detailed explanation of Linux vmstat command in practice

Reprinted: http://www.cnblogs.com/ggjucheng/archive/2012/01/05/2312625.html

The vmstat command is the most common Linux/Unix monitoring tool, which can display the status values ​​of the server at a given time interval, including the server's CPU usage, memory usage, virtual memory swap, IO read and write. This command is my favorite command for checking Linux/Unix. One is that Linux/Unix supports both. The other is that compared to top, I can see the CPU, memory, and IO usage of the entire machine, instead of just seeing each process. The CPU usage and memory usage (the usage scenarios are different).

Generally, the use of the vmstat tool is done through two numerical parameters. The first parameter is the number of sampling time intervals, in seconds, and the second parameter is the number of sampling times, such as:

root@ubuntu:~# vmstat 2 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 3498472 315836 3819540 0 0 0 1 2 0 0 0 100 0

2 indicates that the server status is collected every two seconds, and 1 indicates that it is collected only once.

In fact, during the application process, we will continue to monitor for a period of time, and we can just end vmstat without monitoring, for example:

copy code
root@ubuntu:~# vmstat 2  
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 3499840 315836 3819660 0 0 0 1 2 0 0 0 100 0
0 0 0 3499584 315836 3819660 0 0 0 0 88 158 0 0 100 0
0 0 0 3499708 315836 3819660 0 0 0 2 86 162 0 0 100 0
0 0 0 3499708 315836 3819660 0 0 0 10 81 151 0 0 100 0
1 0 0 3499732 315836 3819660 0 0 0 2 83 154 0 0 100 0
copy code

This means that vmstat collects data every 2 seconds, and keeps collecting until I end the program. After collecting data 5 times, I end the program.

Well, the introduction of the command is completed, and now we will start to explain the meaning of each parameter in practice.

r  represents the running queue (that is, how many processes are actually allocated to the CPU). The server I tested is currently relatively idle, and no programs are running. When this value exceeds the number of CPUs, there will be a CPU bottleneck. This is also related to the load of top. Generally, if the load exceeds 3, it is relatively high, if it exceeds 5, it is high, and if it exceeds 10, it is not normal, and the state of the server is very dangerous. The load of top is similar to the run queue per second. If the run queue is too large, it means that your CPU is very busy, which generally results in high CPU usage.

b  represents the blocked process, this is not much to say, the process is blocked, everyone understands.

The used size of the swpd  virtual memory. If it is greater than 0, it means that the physical memory of your machine is insufficient. If it is not the cause of the program memory leak, then you should upgrade the memory or migrate the memory-consuming tasks to other machines.

The size of free    physical memory, my machine memory is 8G in total, and the remaining 3415M.

buff    Linux/Unix system is used to store, what is in the directory, the cache of permissions, etc. My local machine occupies about 300 M

The cache  cache is directly used to memorize the files we open and buffer the files. My local machine occupies about 300 M (this is the cleverness of Linux/Unix, and part of the free physical memory is used to cache files and directories , is to improve the performance of program execution, when the program uses memory, buffer/cached will be used quickly.)

si   is the size of the virtual memory read from the disk per second. If this value is greater than 0, it means that the physical memory is not enough or the memory is leaked. It is necessary to find the memory-consuming process and solve it. I have plenty of memory on my machine and everything works fine.

so   the size of virtual memory written to disk per second, if this value is greater than 0, the same as above.

The number of blocks received per second by the bi   block device. The block device here refers to all disks and other block devices on the system. The default block size is 1024byte. I have no IO operations on my local machine, so it is always 0, but I have been processing It can reach 140000/s on a machine that copies a large amount of data (2-3T), and the disk write speed is almost 140M per second

The number of blocks sent by the bo  block device per second. For example, if we read a file, bo should be greater than 0. Bi and bo are generally close to 0, otherwise IO is too frequent and needs to be adjusted.

in  Number of CPU interrupts per second, including time interrupts

cs  The number of context switches per second. For example, when we call a system function, we need to perform context switching, thread switching, and process context switching. The smaller the value, the better. If it is too large, consider reducing the number of threads or processes. , For example, in web servers such as apache and nginx, we generally perform performance tests with thousands or even tens of thousands of concurrent tests. The process of selecting a web server can be downgraded by the peak value of the process or thread, and the pressure test will continue until cs To a relatively small value, the number of processes and threads is a more appropriate value. The same is true of system calls. Every time a system function is called, our code will enter the kernel space, resulting in context switching. This is very resource-intensive, and we should try to avoid calling system functions frequently. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious work, and the CPU is not fully utilized, which is not desirable.

The us  user CPU time, I used to be on a server that performs frequent encryption and decryption, and I can see that the us is close to 100, and the r running queue reaches 80 (the machine is doing a stress test, and the performance is not good).

sy  System CPU time, if it is too high, it means that the system call time is long, such as frequent IO operations.

id is the   idle CPU time. Generally speaking, id + us + sy = 100. Generally, I think id is the idle CPU usage, us is the user CPU usage, and sy is the system CPU usage.

wt  wait for IO CPU time.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324887618&siteId=291194637