After reading Daniel’s article, I won’t be afraid of slow server response.

Recently, the server’s response is very slow, and the applications on the server often time out, etc., and sometimes get stuck. After finding that the server I/O pressure is very high, the pressure from the hard disk I/O access has reached 100% .
  
The last reason is that the online business code was written at the same time, which caused the server hard disk I/O to burst. I will record it here for the convenience of you and yourself in the future to solve such problems as soon as possible.
  
Use the top command to take a look at real-time viewing system status information:

CPU status (CPU s): user process occupancy ratio, system process occupancy ratio, user's nice priority process occupancy ratio and idle CPU resource ratio, etc.;

Memory status (Mem): total memory, used amount, free amount, etc.;

Swap partition status (Swap): total swap partition, usage, idle volume, etc.;

The description of the parameters in the CPU state:

us: Ratio of CPU time used in user mode

sy: CPU time used in system mode

ni: User-mode CPU time ratio used as nice weighted process allocation

id: idle CPU time ratio

wa: CPU waits for disk write completion time

hi: hard interruption time

si: soft interrupt consumption time

st: the virtual machine steals time

It can be seen that the wa (71.1%) of the server is extremely high, and the percentage of CPU time occupied by IO waiting is higher than 30%, indicating that there is a problem with the disk IO.

We use iostat and other commands to continue the detailed analysis. If there is no iostat on the server, install it as follows:

[root@Mike-VM-Node-172_31_225_214 ~]# yum install sysstat
[root@Mike-VM-Node-172_31_225_214 ~]# iostat 
Linux 3.10.0-514.26.2.el7.x86_64 (Mike-VM-Node172_31_225_214.com)     11/03/2020     _x86_64_    (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.14    0.00    0.04    0.01    0.00   99.81

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda               0.44         1.38         4.59    1786837    5940236

[root@Mike-VM-Node-172_31_225_214 ~]#

Parameter Description:

%user: the percentage of time the CPU is in user mode

%nice: The percentage of time that the CPU is in user mode with NICE value

%system: The percentage of time the CPU is in system mode

%iowait: the percentage of time the CPU waits for input and output to complete

%steal: The percentage of unconscious wait time of the virtual CPU when the hypervisor maintains another virtual processor

%idle: CPU idle time percentage

tps: The number of transmissions per second of the device, "one transmission" means "one I/O request". Multiple logical requests may be combined into "one I/O request". The size of the "one transfer" request is unknown

kB_read/s: The amount of data read from the device per second

kB_wrtn/s: The amount of data written to the device per second

kB_read: the total amount of data read

kB_wrtn: The total amount of data written; these units are Kilobytes

Use the iostat -x 1 10 command to check the IO status.

[root@Mike-VM-Node-172_31_225_214 ~]# iostat -x 1 10
Linux 3.10.0-514.26.2.el7.x86_64 (Mike-VM-Node172_31_225_214.com)     11/03/2020     _x86_64_    (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    0.04    97.01    0.00   99.82

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.10    0.06    0.33     1.07     4.42    28.07     0.00   10.94   22.13    8.83   0.35   0.01

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.00    0.00    4.00   95.00    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00 2140.00     0.00  8560.00     8.00    19.87    9.29    0.00    9.29   0.47 100.00

You can view %util 100.00 %idle 99.82.

The value of %util has been increasing, and the utilization of the disk is getting higher and higher, indicating that the io operation is more and more frequent, and the use of disk resources is increasing, which is consistent with increasing the thread for io operation.

If %util is already 100%, it means that too many I/O requests are generated, the I/O system is fully loaded, and the disk may have a bottleneck.

Idle is greater than 99%. The IO pressure has reached a very limit. Generally, the read speed has more wait.

Parameter Description:

rrqm/s: The number of merge read operations per second. I.e. rmerge/s

wrqm/s: The number of merge write operations per second. I.e. wmerge/s

r/s: The number of read I/O devices completed per second. Ie rio/s

w/s: The number of write I/O devices completed per second. I.e. wio/s

rkB/s: The number of K bytes read per second. It is half of rsect/s because each sector is 512 bytes in size

wkB/s: The number of K bytes written per second. Is half of wsect/s

avgrq-sz: Average data size (sector) of each device I/O operation

avgqu-sz: average I/O queue length

rsec/s: The number of sectors read per second. Ie rsect/s

wsec/s: The number of sectors written per second. I.e. wsect/s

r_await: The average time required for each read operation, including not only the time of the hard disk device read operation, but also the time waiting in the kernel queue

w_await: The average time required for each write operation, including not only the time of hard disk device write operations, but also the time waiting in the kernel queue

await: Average waiting time (milliseconds) for each device I/O operation

svctm: average service time per device I/O operation (milliseconds)

%util: What percentage of a second is used for I/O operations, that is, the percentage of cpu consumed by io

If you want to perform an IO load stress test on the hard disk, you can use the fio command. If there is no fio on the server, install it as follows:

[root@Mike-VM-Node-172_31_225_214 ~]# yum install -y fio

The following command will generate 30 1G files in the specified directory, which are executed concurrently by multiple threads:

[root@Mike-VM-Node-172_31_225_214 /tmp]# fio -directory=/tmp/ -name=readtest -direct=1-iodepth1-thread -rw=write -ioengine=psync -bs=4k -size=1G -numjobs=30-runtime=3-group_reporting

numjobs=30 means 30 concurrent jobs.

-rw=Read, single test, read write, single test, write rw, read and write at the same time, randrw, random read and write, and randread, single test, random read, randwrite, single test, random write.

-runtime=The unit is seconds, which means the total duration of the test.

If you

① Engage in functional testing and want to advance automated testing

②I have been in the testing industry for one or two years, but still can’t type code

③ Interviews with big companies repeatedly hit the wall

I invite you to join the group! Come on~~ Tester, 313782132 (There are technical experts in the Q group to communicate and share, the value of learning resources depends on your actions, don’t be a "collector") Get more technology and interview materials

Guess you like

Origin blog.csdn.net/weixin_50829653/article/details/111598436