Preface
In the previous section, we talked about the installation of Fio tools, the installation of dependent libraries and the gcc environment, and the simple use. Today is an advanced explanation, mainly about some practical experience in performance testing and stability testing, as well as some precautions. For disk performance testing, each server manufacturer will have its own set of procedures, and vendors such as Huawei H3C will test the performance of their servers. There are many tools for disk performance testing, currently the mainstream ones are: fio and iozone
Fio (used in this article)
iozone (also relatively good)
The installation and process of fio are in the previous section: https://blog.csdn.net/u013521274/article/details/107949362
1. Fio performance test
The performance test is mainly to test the read and write performance of the disk, generally testing sequential read, sequential write, random read, random write , 4 modes.
Test performance indicators: bw average io bandwidth
Iops is a performance test indicator that can be simply understood as the number of reads and writes per second (this indicator is displayed in the test results)
There are many parameters to be set when writing a script in Fio, as shown below:
参数说明:
filename=/dev/sdb1 测试文件名称,通常选择需要测试的盘的data目录。
direct=1 是否使用directIO,测试过程绕过OS自带的buffer,使测试磁盘的结果更真实。Linux读写的时候,内核维护了缓存,数据先写到缓存,后面再后台写到SSD。读的时候也优先读缓存里的数据。这样速度可以加快,但是一旦掉电缓存里的数据就没了。所以有一种模式叫做DirectIO,跳过缓存,直接读写SSD。
rw=randwrite 测试随机写的I/O
rw=randrw 测试随机写和读的I/O
bs=16k 单次io的块文件大小为16k
bsrange=512-2048 同上,提定数据块的大小范围
size=5G 每个线程读写的数据量是5GB。
numjobs=1 每个job(任务)开1个线程,这里用了几,后面每个用-name指定的任务就开几个线程测试。所以最终线程数=任务数(几个name=jobx)* numjobs。
name=job1: 一个任务的名字,重复了也没关系。如果fio -name=job1 -name=job2,建立了两个任务,共享-name=job1之前的参数。-name之后的就是job2任务独有的参数。
thread 使用pthread_create创建线程,另一种是fork创建进程。进程的开销比线程要大,一般都采用thread测试。
runtime=1000 测试时间为1000秒,如果不写则一直将5g文件分4k每次写完为止。
ioengine=libaio 指定io引擎使用libaio方式。libaio:Linux本地异步I/O。请注意,Linux可能只支持具有非缓冲I/O的排队行为(设置为“direct=1”或“buffered=0”);rbd:通过librbd直接访问CEPH Rados
iodepth=16 队列的深度为16.在异步模式下,CPU不能一直无限的发命令到SSD。比如SSD执行读写如果发生了卡顿,那有可能系统会一直不停的发命令,几千个,甚至几万个,这样一方面SSD扛不住,另一方面这么多命令会很占内存,系统也要挂掉了。这样,就带来一个参数叫做队列深度。
Block Devices(RBD) 无需使用内核RBD驱动程序(rbd.ko)。该参数包含很多ioengine,如:libhdfs/rdma等
rwmixwrite=30 在混合读写的模式下,写占30%
group_reporting 关于显示结果的,汇总每个进程的信息。
此外
lockmem=1g 只使用1g内存进行测试。
zero_buffers 用0初始化系统buffer。
nrfiles=8 每个进程生成文件的数量。
磁盘读写常用测试点:
1. Read=100% Ramdon=100% rw=randread (100%随机读)
2. Read=100% Sequence=100% rw=read (100%顺序读)
3. Write=100% Sequence=100% rw=write (100%顺序写)
4. Write=100% Ramdon=100% rw=randwrite (100%随机写)
5. Read=70% Sequence=100% rw=rw, rwmixread=70, rwmixwrite=30
(70%顺序读,30%顺序写)
6. Read=70% Ramdon=100% rw=randrw, rwmixread=70, rwmixwrite=30
(70%随机读,30%随机写)
The detailed setting instructions of Fio parameters are as shown above. If you have any questions, you can check the details online.
Second, the actual code
2.1 Test code (sequential reading and writing)
#!/bin/sh
export test=fio
echo "fio_write测试"
echo $(date +%F%n%T)
fio -directory=/fiorwtest -rw=write -bs=1M -direct=1 -iodepth 2 -ioengine=libaio -size 1G -thread -numjobs=2 -group_reporting -name=write1M_1Gjob
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
#ansible all -m shell -a "/tmp/qingli.sh"
echo "write 执行完毕"
echo "fio_read测试"
echo $(date +%F%n%T)
fio -directory=/fiorwtest -rw=read -bs=1M -direct=1 -iodepth 2 -ioengine=libaio -size 1G -thread -numjobs=2 -group_reporting -name=write1M_1Gjob
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
#ansible all -m shell -a "/tmp/qingli.sh" 196执行197上的
rm -f /fiorwtest/write1M_1Gjob*
echo "read 执行完毕"
As shown above, the program executes the write operation first, executes the read operation, and finally deletes the generated file. The execution result is as follows.
As shown in the figure above, the red line box is our result indicator, one is the write operation and the other is the read operation.
2.2 Fio code (random read and write)
#!/bin/sh
export test=fio
#随机写
echo "randwrite4k4job"
echo $(date +%F%n%T)
fio -directory=/hlstor/cluster/fubenjuan1 -rw=randwrite -bs=4k -direct=1 -iodepth 8 -ioengine=libaio -size 35G -thread -numjobs=4 -group_reporting -name=randwrite4k_4job
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
ansible all -m shell -a "/tmp/qingli.sh"
#随机读
echo "randread4k4job"
echo $(date +%F%n%T)
fio -directory=/hlstor/cluster/fubenjuan1 -rw=randread -bs=4k -direct=1 -iodepth 8 -ioengine=libaio -size 35G -thread -numjobs=4 -group_reporting -name=randwrite4k_4job
echo $(date +%F%n%T)
sync
echo 3 > /proc/sys/vm/drop_caches
ansible all -m shell -a "/tmp/qingli.sh"
rm -f /hlstor/cluster/fubenjuan1/randwrite4k_4job*
测试随机读写的 -rw=randwrite、randread
-rw= --设置为randwrite randread 其他无差别
A parameter difference
2.3 Fio code loop
There are 3 ways to loop the code, which is similar to the loop writing in Python language and java. Let’s briefly introduce an example.
The general writing of for loop is as follows:
#!/bin/sh
export test=fio
rwmode=("a" "b" "c" "d")
bssize=("1" "2" "3" "4")
for i in ${rwmode[@]}
do
echo $i
echo $(date +%F%n%T)
for j in ${bssize[@]}
do
echo $j
done;
done;
---------------------------第2种----------------------------------
#!/bin/sh
export test=fio
rwmode=("a b c d")
bssize=("1 2 3 4") 差别在这 这个数组的写法
for i in ${rwmode[@]}
do
echo $i
echo $(date +%F%n%T)
for j in ${bssize[@]}
do
echo $j
done;
done;
As shown above, the difference lies in the writing of the array rwmode. The results of both executions are the same.
Shell online simulator https://c.runoob.com/compile/18
2.3 Fio code actual combat loop
Execute 4k, 1024k block size operations in write mode cyclically, and 4k, 1024k block size operations in read mode
Note that the parameter changes need to add the $ sign
Benefits of loops: less code
Disadvantages of loops: poor readability (according to your needs)
Note: In the fio code here, there is a new parameter -runtime=900
In it-the runtime parameter unit is seconds
1. The time when fio executes the runtime is launched, even if it has been executed, it must continue to execute until it reaches the runtime. If set, even if the file has been completely read or written or written, the time specified by the runtime must be executed. It is achieved by executing the same load in a loop.
2. Set the time to run a specific load before recording any performance information. This is used to record the log results after the performance is stable, so it can reduce the running time required to generate stable results.
3. Used in the debt test, no failure running time under 60% of the CPU load.
#!/bin/sh
export test=fio
rwmode=( "write" "read" )
bssize=( "4k" "1024k" )
for j in ${rwmode[@]};
do for i in ${bssize[@]};
do echo "fuse_fenbujuan_$j_$i_1job"
echo $(date +%F%n%T)
fio -directory=/hlstor/fenbujuan1 -rw=$j -bs=$i -direct=1 -iodepth 8 -ioengine=libaio -size 50G -thread -numjobs=1 -runtime=900 -group_reporting -name=$j_$i_1job
echo $(date +%F%n%T)
sleep 60
echo 3 > /proc/sys/vm/drop_caches
done;
done;
Three, performance monitoring
3.1 iostat command
iostat command and results
iostat -m 1
iostat -x 1
iostat -h 1
每1秒刷新
3.2 Real-time memory
#!/bin/bash
echo `date` > /root/ansible_test.txt
时间保存到文件
The real-time status of the memory is output to the log file
#!/bin/sh
while :
do
free -m >> nei.log
sleep 1
done
-m 是字节 -h 可视化比较好
while :
do
free -h >> /lyfiolog/Memory.log
sleep 1
done
The results of the two commands are as follows:
3.3 The log output of sh script execution, that is, the fio log output
This command can output logs to a file, but it cannot be stopped during execution and cannot do other things.
sh fio-liyang-w2.sh >> fioly.log
After this command is added nohup, it will be executed in the background and will not affect other things.
nohup sh fio.sh >> fio.log &
Four, stability test
Stable line test or reliability test can be done, there is not much distinction, the main purpose is to detect the disk is between 55%-80% of the CPU load, fault-free running time, and can be used to determine the performance of the disk, or the NAS magnetic storage system There is a good measure of its robustness.
The stability is not too different from the above performance test. The key is to keep the machine running, keep running, and run again after running. Generally, it takes about 2-3 days to perform. Get the final result: bw iops, etc. are a good measure of stability.
The following is an example: just a small part of the code, there are more real projects than this.
#!/bin/sh
export test=fio
echo "nfs_fuben"
echo "write4k8job"
echo $(date +%F%n%T)
fio -directory=/data/test1fuben -rw=write -bs=4k -direct=1 -iodepth 8 -ioengine=libaio -size 20G -thread -numjobs=8 -group_reporting -name=write4k_8job
echo $(date +%F%n%T)
echo "read4k8job"
echo $(date +%F%n%T)
fio -directory=/data/test1fuben -rw=read -bs=4k -direct=1 -iodepth 8 -ioengine=libaio -size 20G -thread -numjobs=8 -group_reporting -name=write4k_8job
echo $(date +%F%n%T)
echo "write1M4job"
echo $(date +%F%n%T)
fio -directory=/data/test1fuben -rw=write -bs=1M -direct=1 -iodepth 8 -ioengine=libaio -size 20G -thread -numjobs=4 -group_reporting -name=write1m_4job
echo $(date +%F%n%T)
echo "read1M4job"
echo $(date +%F%n%T)
fio -directory=/data/test1fuben -rw=read -bs=1M -direct=1 -iodepth 8 -ioengine=libaio -size 20G -thread -numjobs=4 -group_reporting -name=write1m_4job
echo $(date +%F%n%T)
Five, summarize the test report by yourself
cluster layer | 4k sequential write | 4k sequential reading | 1M sequential write | 1M sequential reading | 4k random write | 4k random read | 1M random write | 1M random read |
bw | 36M/s | 65.2M/s | 2384 | 3061 | 120 | 131 | 1952 | 2598 |
iops | 18.9k | 33.4k | 2384 | 3061 | 30.8 | 33.5 | 1952 | 2599 |
As shown in the above table, this test on the cluster layer can be different volumes, distributed volumes, replica volumes, error correction volumes, or Fuse and nfs layers. I conducted the test with different -bs blocks, 4k, 1M sequential read and write, random read and write, some of the higher results are because my computer is an SSD solid state drive. The results are summarized into this table. It's more intuitive.
5.2 Attention
In real combat projects, the test machine server often needs to install two operating systems, Linux and windows server. Windows is a little different when Fio is executed. In the Fio command, -numjobs is generally set to 4 8 10, etc. Linux will give a final value after executing these 4 files, as shown in the first figure of this article.
But on windows, numjobs=4 will produce 4 execution results. This is because each file produces a result. Then in the test result, the bw iops of these 4 results must be added up to be the final test result.
to sum up:
For the fio performance test in the previous section, I have made an advanced step. If you have any questions, you can leave a message and you will reply if you see it.
This article refers to the blog: https://blog.csdn.net/qq_14935437/article/details/93749444