Linux system process context switching frequently causes the load average to be too high

Frequent process context switching causes the load average to be too high

phenomenon

Recently, it was discovered that 95% of the CPU of a virtual machine host is in the idle state, the memory usage is not particularly high, and the load average of the host is very high.

problem analysis

First, use common tools such as top, free, ps, and iostat on the host to analyze the CPU, memory, and IO usage of the host, and find that the three are not high. The result of checking through vmstat 1 is as follows:
[External link image transfer failed, the origin site may have anti-leeching mechanism, it is recommended to save the image and upload it directly (img-zVes6qbQ-1601031023096)(http://10.152.160.36/server/index .php?s=/api/attachment/visitFile/sign/f20d8c84e7210a03e90076e140e06c2a&showdoc=.jpg)]

Judging from the output of vmstat, the block in and block out of the io item are not frequent. The number of interrupts per scale (in) and context switches per second (cs) in the system item are particularly frequent. This causes the load avaerage to be extremely high. The root cause in the general direction has been found. Which process is specifically how to frequently interrupt and switch between upper and lower files?
Through promethues monitoring, it is confirmed again:
[External link image transfer failed, the source site may have an anti-leech chain mechanism, it is recommended to save the image and upload it directly (img-ouX2kWWR-1601031023099)(http://10.152.160.36/server/index.php ?s=/api/attachment/visitFile/sign/a9ee126bf916db9b4ca3ad4d33a1fe74&showdoc=.jpg)]
Here, use pidstat -w 1 (refresh the output context switch situation every second), the output is shown in the following figure:
[External link image transfer failed, the source station may There is an anti-leech link mechanism, it is recommended to save the picture and upload it directly (img-SRGDST4D-1601031023100)(http://10.152.160.36/server/index.php?s=/api/attachment/visitFile/sign/1e78306d9005208cdfd9e1991369053f&showdoc=.jpg) ]
From the figure above, we can see that there are cswch (voluntary context switching) and nvcswch (involuntary context switching) and the corresponding commands. The file exchange occupied by vsftpd is relatively large. You can see that there is still a big gap between the cs value and the total value displayed here. Because more than one vsftpd process is started on the host, and pidstat does not display all of them when refreshed in 1 second, perform several collections through pidstat -w It is found that the cs values ​​occupied by all vsftpd processes are superimposed and similar to those in vmstat.

postscript

After notifying the business personnel of the results, because the directory structure used by ftp is deeper and the number of files is relatively large, the business after backing up the old used directory and recreating the single-level directory, after observing for a period of time, it is found that the load average has dropped.

Guess you like

Origin blog.csdn.net/qq_31555951/article/details/108802336