Test learning-103-Fio tool advanced performance test and stability test

 

Preface

   In the previous section, we talked about the installation of Fio tools, the installation of dependent libraries and the gcc environment, and the simple use. Today is an advanced explanation, mainly about some practical experience in performance testing and stability testing, as well as some precautions. For disk performance testing, each server manufacturer will have its own set of procedures, and vendors such as Huawei H3C will test the performance of their servers. There are many tools for disk performance testing, currently the mainstream ones are: fio and iozone 

Fio (used in this article)

iozone (also relatively good)

The installation and process of fio are in the previous section: https://blog.csdn.net/u013521274/article/details/107949362


1. Fio performance test

The performance test is mainly to test the read and write performance of the disk, generally testing  sequential read, sequential write, random read, random write , 4 modes.

Test performance indicators:  bw   average io bandwidth 

                          Iops   is a performance test indicator that can be simply understood as the number of reads and writes per second (this indicator is displayed in the test results)

There are many parameters to be set when writing a script in Fio, as shown below:

参数说明:
filename=/dev/sdb1  测试文件名称,通常选择需要测试的盘的data目录。
direct=1            是否使用directIO,测试过程绕过OS自带的buffer,使测试磁盘的结果更真实。Linux读写的时候,内核维护了缓存,数据先写到缓存,后面再后台写到SSD。读的时候也优先读缓存里的数据。这样速度可以加快,但是一旦掉电缓存里的数据就没了。所以有一种模式叫做DirectIO,跳过缓存,直接读写SSD。 
rw=randwrite        测试随机写的I/O
rw=randrw           测试随机写和读的I/O
bs=16k              单次io的块文件大小为16k
bsrange=512-2048    同上,提定数据块的大小范围
size=5G             每个线程读写的数据量是5GB。
numjobs=1           每个job(任务)开1个线程,这里用了几,后面每个用-name指定的任务就开几个线程测试。所以最终线程数=任务数(几个name=jobx)* numjobs。 
name=job1:         一个任务的名字,重复了也没关系。如果fio -name=job1 -name=job2,建立了两个任务,共享-name=job1之前的参数。-name之后的就是job2任务独有的参数。 
thread              使用pthread_create创建线程,另一种是fork创建进程。进程的开销比线程要大,一般都采用thread测试。 
runtime=1000        测试时间为1000秒,如果不写则一直将5g文件分4k每次写完为止。
ioengine=libaio     指定io引擎使用libaio方式。libaio:Linux本地异步I/O。请注意,Linux可能只支持具有非缓冲I/O的排队行为(设置为“direct=1”或“buffered=0”);rbd:通过librbd直接访问CEPH Rados 
iodepth=16          队列的深度为16.在异步模式下,CPU不能一直无限的发命令到SSD。比如SSD执行读写如果发生了卡顿,那有可能系统会一直不停的发命令,几千个,甚至几万个,这样一方面SSD扛不住,另一方面这么多命令会很占内存,系统也要挂掉了。这样,就带来一个参数叫做队列深度。
Block Devices(RBD) 无需使用内核RBD驱动程序(rbd.ko)。该参数包含很多ioengine,如:libhdfs/rdma等
rwmixwrite=30       在混合读写的模式下,写占30%
group_reporting     关于显示结果的,汇总每个进程的信息。
此外
lockmem=1g          只使用1g内存进行测试。
zero_buffers        用0初始化系统buffer。
nrfiles=8           每个进程生成文件的数量。
磁盘读写常用测试点:
1. Read=100%  Ramdon=100%   rw=randread   (100%随机读)
2. Read=100%  Sequence=100% rw=read      (100%顺序读)
3. Write=100% Sequence=100% rw=write     (100%顺序写)
4. Write=100% Ramdon=100%   rw=randwrite (100%随机写)
5. Read=70%   Sequence=100% rw=rw, rwmixread=70, rwmixwrite=30
(70%顺序读,30%顺序写)
6. Read=70%   Ramdon=100% rw=randrw, rwmixread=70, rwmixwrite=30
(70%随机读,30%随机写)

The detailed setting instructions of Fio parameters are as shown above. If you have any questions, you can check the details online.

Second, the actual code

2.1 Test code (sequential reading and writing)

#!/bin/sh

export test=fio

echo "fio_write测试"
echo $(date +%F%n%T)
fio -directory=/fiorwtest -rw=write -bs=1M -direct=1 -iodepth 2 -ioengine=libaio -size 1G  -thread -numjobs=2  -group_reporting -name=write1M_1Gjob
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
#ansible all -m shell -a "/tmp/qingli.sh"
echo "write 执行完毕"
echo "fio_read测试"
echo $(date +%F%n%T)
fio -directory=/fiorwtest -rw=read -bs=1M -direct=1 -iodepth 2 -ioengine=libaio -size 1G  -thread -numjobs=2  -group_reporting -name=write1M_1Gjob
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
#ansible all -m shell -a "/tmp/qingli.sh" 196执行197上的
rm -f /fiorwtest/write1M_1Gjob*
echo "read 执行完毕"

As shown above, the program executes the write operation first, executes the read operation, and finally deletes the generated file. The execution result is as follows.

As shown in the figure above, the red line box is our result indicator, one is the write operation and the other is the read operation.

2.2 Fio code (random read and write)

#!/bin/sh

export test=fio

#随机写
echo "randwrite4k4job"
echo $(date +%F%n%T)
fio -directory=/hlstor/cluster/fubenjuan1 -rw=randwrite -bs=4k -direct=1 -iodepth 8 -ioengine=libaio -size 35G  -thread -numjobs=4  -group_reporting -name=randwrite4k_4job
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
ansible all -m shell -a "/tmp/qingli.sh"
#随机读
echo "randread4k4job"
echo $(date +%F%n%T)
fio -directory=/hlstor/cluster/fubenjuan1 -rw=randread -bs=4k -direct=1 -iodepth 8 -ioengine=libaio -size 35G  -thread -numjobs=4  -group_reporting -name=randwrite4k_4job
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
ansible all -m shell -a "/tmp/qingli.sh"
rm -f /hlstor/cluster/fubenjuan1/randwrite4k_4job*

测试随机读写的 -rw=randwrite、randread
-rw=  --设置为randwrite randread  其他无差别

A parameter difference

2.3 Fio code loop 

There are 3 ways to loop the code, which is similar to the loop writing in Python language and java. Let’s briefly introduce an example.

The general writing of for loop is as follows:

#!/bin/sh

export test=fio

rwmode=("a" "b" "c" "d")
bssize=("1" "2" "3" "4")

for i in ${rwmode[@]}
do 
    echo $i
    echo $(date +%F%n%T)
	
    for j in ${bssize[@]}
    do	
	  echo $j
    done;
done;
---------------------------第2种----------------------------------
#!/bin/sh

export test=fio

rwmode=("a b c d")
bssize=("1 2 3 4")     差别在这 这个数组的写法

for i in ${rwmode[@]}
do 
    echo $i
    echo $(date +%F%n%T)
	
    for j in ${bssize[@]}
    do	
	  echo $j
    done;
done;

As shown above, the difference lies in the writing of the array rwmode. The results of both executions are the same.

Shell online simulator https://c.runoob.com/compile/18

2.3 Fio code actual combat loop

Execute 4k, 1024k block size operations in write mode cyclically, and 4k, 1024k block size operations in read mode

Note that the parameter changes need to add the $ sign

Benefits of loops: less code

Disadvantages of loops: poor readability (according to your needs)

Note: In the fio code here, there is a new parameter -runtime=900

           In it-the runtime parameter unit is seconds

     1. The time when fio executes the runtime is launched, even if it has been executed, it must continue to execute until it reaches the runtime. If set, even if the file has been completely read or written or written, the time specified by the runtime must be executed. It is achieved by executing the same load in a loop.
     2. Set the time to run a specific load before recording any performance information. This is used to record the log results after the performance is stable, so it can reduce the running time required to generate stable results.

     3. Used in the debt test, no failure running time under 60% of the CPU load.

#!/bin/sh

export test=fio

rwmode=( "write" "read" )
bssize=( "4k" "1024k" )

for j in  ${rwmode[@]};
    do for i in  ${bssize[@]};
        do echo "fuse_fenbujuan_$j_$i_1job"
        echo $(date +%F%n%T)
        fio -directory=/hlstor/fenbujuan1 -rw=$j -bs=$i -direct=1 -iodepth 8 -ioengine=libaio -size 50G  -thread -numjobs=1 -runtime=900 -group_reporting -name=$j_$i_1job
        echo $(date +%F%n%T)
        sleep 60
        echo 3 > /proc/sys/vm/drop_caches
    done;
done;

Three, performance monitoring

3.1 iostat command

iostat command and results

iostat -m 1 
iostat -x 1
iostat -h 1  
每1秒刷新

3.2 Real-time memory

#!/bin/bash
echo `date` > /root/ansible_test.txt

时间保存到文件

The real-time status of the memory is output to the log file 

#!/bin/sh
while :
do
    free -m >> nei.log
    sleep 1
done

-m 是字节  -h 可视化比较好 

while :
do
    free -h >> /lyfiolog/Memory.log
    sleep 1
done

 The results of the two commands are as follows:

3.3 The log output of sh script execution, that is, the fio log output

      This command can output logs to a file, but it cannot be stopped during execution and cannot do other things.

 sh fio-liyang-w2.sh >> fioly.log

     After this command is added nohup, it will be executed in the background and will not affect other things.

nohup sh fio.sh  >> fio.log &

Four, stability test

        Stable line test or reliability test can be done, there is not much distinction, the main purpose is to detect the disk is between 55%-80% of the CPU load, fault-free running time, and can be used to determine the performance of the disk, or the NAS magnetic storage system There is a good measure of its robustness.

        The stability is not too different from the above performance test. The key is to keep the machine running, keep running, and run again after running. Generally, it takes about 2-3 days to perform. Get the final result: bw iops, etc. are a good measure of stability.

        The following is an example: just a small part of the code, there are more real projects than this.

#!/bin/sh

export test=fio

echo "nfs_fuben"


echo "write4k8job"
echo $(date +%F%n%T)
fio  -directory=/data/test1fuben  -rw=write  -bs=4k  -direct=1  -iodepth 8  -ioengine=libaio  -size 20G   -thread  -numjobs=8  -group_reporting  -name=write4k_8job
echo $(date +%F%n%T)

echo "read4k8job"
echo $(date +%F%n%T)
fio  -directory=/data/test1fuben  -rw=read  -bs=4k  -direct=1  -iodepth 8  -ioengine=libaio  -size 20G   -thread  -numjobs=8  -group_reporting  -name=write4k_8job
echo $(date +%F%n%T)



echo "write1M4job"
echo $(date +%F%n%T)
fio  -directory=/data/test1fuben  -rw=write  -bs=1M  -direct=1  -iodepth 8  -ioengine=libaio  -size 20G   -thread  -numjobs=4  -group_reporting  -name=write1m_4job
echo $(date +%F%n%T)

echo "read1M4job"
echo $(date +%F%n%T)
fio  -directory=/data/test1fuben  -rw=read  -bs=1M  -direct=1  -iodepth 8  -ioengine=libaio  -size 20G   -thread  -numjobs=4  -group_reporting  -name=write1m_4job
echo $(date +%F%n%T)

Five, summarize the test report by yourself

cluster layer 4k sequential write 4k sequential reading 1M sequential write 1M sequential reading 4k random write 4k random read 1M random write 1M random read
bw 36M/s 65.2M/s 2384 3061 120 131 1952 2598
iops 18.9k 33.4k 2384 3061 30.8 33.5 1952 2599

     As shown in the above table, this test on the cluster layer can be different volumes, distributed volumes, replica volumes, error correction volumes, or Fuse and nfs layers. I conducted the test with different -bs blocks, 4k, 1M sequential read and write, random read and write, some of the higher results are because my computer is an SSD solid state drive. The results are summarized into this table. It's more intuitive.

5.2 Attention

     In real combat projects, the test machine server often needs to install two operating systems, Linux and windows server. Windows is a little different when Fio is executed. In the Fio command, -numjobs is generally set to 4 8 10, etc. Linux will give a final value after executing these 4 files, as shown in the first figure of this article.

     But on windows, numjobs=4 will produce 4 execution results. This is because each file produces a result. Then in the test result, the bw iops of these 4 results must be added up to be the final test result.

to sum up:

For the fio performance test in the previous section, I have made an advanced step. If you have any questions, you can leave a message and you will reply if you see it. 

This article refers to the blog: https://blog.csdn.net/qq_14935437/article/details/93749444

Guess you like

Origin blog.csdn.net/u013521274/article/details/108253194