blktrace combines btt to analyze IO performance

Introduction to blktrace

       blktrace is a tracing tool for block device I/O in the Linux kernel. It was developed by the maintainer of the Linux kernel block device layer. Through this tool, users can obtain various detailed information about the I/O request queue, including the process name, process number, execution time, physical block number for reading and writing, block size, etc.

How blktrace works

       1: When blktrace is tested, threads will be allocated to the number of logical CPUs on the physical machine, and each thread will be bound to a logical CPU to collect data.

          For example: 9630 mobile phone has cpu0, cpu1, cpu2, cpu3, ​​so 4 threads are started.

       2: blktrace generates a file for each thread in the path mounted by debugfs (default path: /sys/kernel/debug) (with a corresponding file descriptor), and then calls the ioctrl function,

          Generate a system call, pass the corresponding parameters to the kernel to call the corresponding function processing, and the kernel writes data to this file descriptor through the debugfs file system.

       3: blktrace needs to be used in conjunction with blkparse. Use blkparse to parse the binary data in a specific format generated by blktrace.

Use of blktrace on mobile phones

The first step: turn on the trace function of the kernel.

          Steps: 1. source build/envsetup.sh

                 2. lunch select project

                   3. Enter the command kuconfig to configure the kernel

                   4. 选中kernelhacking--->Tracer----->Support for tracing block IO actions                 5. make systemimage -j4

                   6. Download the newly generated system.img boot.img to the phone

              

Step 2: Download the executable blktrace/blkparse executable program to your phone

              Steps: 1.abd root to enter root permissions

                    2.adb remount remount

                    3.adb push blktrace /system/bin/

                    4.adb push blkparse /system/bin

                    5.adb shell

                    6.cd /system/bin

                    7. Modify the properties of blktrace/blkparse, chmod 0777 blktrace

Blktrace preparation

       1: Because the mobile phone has cpu0/cpu1/cpu2/cpu3, ​​a total of 4 CPUs, blktrace starts a total of 4 threads, and one CPU corresponds to one thread. Therefore, ensure that 4 CPUs are awake before running the blktrace monitoring command.

       2: View method: adb root

                 adb shell

                    cd /sys/devices/system/cpu/

                       cat online

    If it displays 0-3, it means that all 4 CPUs are awake. If it displays 0, it means that only cpu0 is awake.

 

       3: Method to wake up the cpu:

     echo 1 >/sys/devices/system/cpu/cpufreq/sprdemand/cpu_hotplug_disable

       4: Just make sure that the four CPUs are awake before running the blktrace monitoring command. Once the blktrace monitoring command is started, whether cpu1/cpu2/cpu3 are on or off, it will not affect the performance of blktrace.

Use Cases

1.     mount -t vfat /dev/block/mmcblk0p1  /data/temp

2. blktrace /dev/block/mmcblk1p1 -o /data/trace The meaning of this command: blktrace monitors the T card and outputs the monitoring results to the /data directory. The generated file is named trace.blktrace.0 trace.blktrace .1 trace.blktrace.2trace.blktrace.3 Their number is determined by the number of CPUs

3. Reopen a terminal dd if=/dev/zero of=/data/temp/11bs=512 count=1024

4. At this time, blktrace can monitor writes. Press ctrl+c to terminate monitoring.

5. Since the trace.blktrace.x file contains binary data, blkparse is required to parse it.

6. File parsing: blkparse -i trace The function of this command is to output the parsing results to the screen (this command is executed in the directory generated by trace.blktrace.x)

At this time, the terminal will output

179,1    2        0    0.000361250     0  m   Ncfq110A  fifo=  (null)

179,1    2        0    0.000364094     0 m   N cfq110A  dispatch_insert

179,1    2        0    0.000378047     0  m   Ncfq110A  dispatched a request

179,0    1       25    0.000381250  2152  A   W2993285 + 1 <- (179,1) 2991237

179,1    2        0    0.000382875     0  m   Ncfq110A  activate rq, drv=1

179,1    1       26    0.000386313  2152  Q   W2993285 + 1 [kworker/u8:1]

179,1    2        1    0.000387250    88  D   W2444 + 2 [mmcqd/0]

179,1    1       27    0.000405024  2152  G   W2993285 + 1 [kworker/u8:1]

179,1    1       28    0.000409250  2152  P   N[kworker/u8:1]

179,0    1       29    0.001225735  2152  A   W2993286 + 1024 <- (179,1) 29912

Detailed explanation of blktrace command

1.     blktrace /dev/block/mmcblk0p1 -o /data/trace  命令解析:监控mmcblk0p1块设备,将生成的文件存储在/data目录下,一共生成4个文件,文件以trace开头,分别为trace.blktrace.0  trace.blktrace.1 trace.blktrace.2trace.blktrace.3分别对应cpu0、cpu1、cpu2、cpu3

2.     blktrace /dev/block/mmcblk0p1 -D /data/trace  命令解析:监控mmcblk0p1块设备,在/data目录下建立一个名字为trace的文件夹,trace文件夹下存放的是名字为                                                                   mmcblk0p1.blktrace.0mmcblk0p1.blktrace.1 mmcblk0p1.blktrace.2 mmcblk0p1.blktrace.3                                                                   分别对应cpu0 cpu1 cpu2 cpu3

3.     blktrace /dev/block/mmcblk0p1 -o /data/trace -w 10  命令解析:-w 选项表示多长时间之后停止监控(单位:秒)  -w 10 表示10秒之后停止监控

4.     blktrace /dev/block/mmcblk0p1 -o /data/trace -a WRITE   命令解析:-a 代表只监控写操作

选项 -a action 表示要监控的动作,action的动作有:

READ (读)

WRITE(写)

BARRIER

SYNC

QUEUE

REQUEUE

ISSUE

COMPLETE

FS

PC

详见:http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html(blktrace user guide)

blkparse工具解析

1.     实时解析,实时数据的解析即上blktrace的“终端输出”实现实时解析的命令:blktrace -d/dev/block/mmcblk0p1 -o - |blkparse -i -

2.     文件解析,分为两种

(1)   在手机上生成解析文件

                       i.             实现方法:进入trace.blktrace.0 trace.blktrace.1 trace.blktrace.2 trace.blktrace.3所在的目录输入命令:blkparse -itrace -o /data/trace.txt

(2)   在PC上实现解析

                    ii.             实现方法,将手机上生成的trace.blktrace.0 trace.blktrace.1 trace.blktrace.2 trace.blktrace.3的文件拷贝到PC上输入命令:./blkparse -i trace -otrace.txt

blktrace解析文件格式

默认的输出内容格式为:"%D ,%8s %5T.%9t %5p * =",

如:

8,16   0      35     1.157274569  2544  GWBS 121897312 + 8 [jbd2/sdb-8]

其中:

%D     主从设备号 :8.16,CPU_id:  0  ###因为此时解析的文件是trace.blktrace.0,搜集的cpu0的信息,所以CPU_id为0###

%8s    io序列号,一般从1开始 :35

%5T.%9t  此IO操作发生时的时间戳秒.纳秒:1.157274569

%5p     process ID :2544

*    IO action:解释见下面

=     RWBS data。R表示读 W表示写D表示块被丢弃B表示barrier operation S表示同步IO:如上面的WBS,表示同步写操作

121897312是相对8,16的扇区起始号,+8,为后面连续的8个扇区(默认一个扇区512byte,所以8个扇区就是4K),后面的[jbd2/sdb-8]是程序的名字。

IO action列表

       C -- complete A previouslyissued request has been completed.  Theout‐

           put will detail the sectorand size of that request, as well as the

           success or failure of it.

       D -- issued A request thatpreviously resided on the block layer  queue

           or in the i/o scheduler hasbeen sent to the driver.

       I -- inserted A request isbeing sent to the i/o scheduler for addition

           to the internal queue andlater service by the driver. The  request

           is fully formed at thistime.

       Q  -- queued This notes intent to queue i/o at the given location.  No

           real requests exists yet.

       B -- bounced The data pagesattached to this bio are not reachable  by

           the  hardware and must be bounced to a lower memory location. This

           causes a big slowdown in i/operformance, since the  data  must be

           copied to/from kernelbuffers. Usually this can be fixed with using

           better hardware -- either abetter i/o controller,  or  a platform

           with an IOMMU.

       M  -- back merge A previously inserted request exists that ends on the

           boundary of where this i/obegins, so the i/o scheduler  can  merge

           them together.

       F  -- front merge Same as the back merge, except this i/o ends where a

           previously inserted requestsstarts.

       M --front or back merge Oneof the above

       G -- get request To send anytype of  request  to a  block  device, a

           struct request containermust be allocated first.

       S  -- sleep  No  available request  structures wereavailable, so the

           issuer has to wait for oneto be freed.

       P -- plug When i/o is queuedto a previously empty block device queue,

           Linux will plug the queue inanticipation of future ios being added

           before this data is needed.

       U -- unplug Some requestdata already queued in the device, start send‐

           ing  requests to  the  driver. This may happen automatically if a

          timeout period has passed (see next  entry)  or if  a  number of

           requests have been added tothe queue.

       T  -- unplug  due  to timer If nobody requests the i/o that wasqueued

           after plugging the queue,Linux will automatically unplug it  after

           a defined period has passed.

       X  -- split On raid or device mapper setups, anincoming i/o may strad‐

           dle a device or internalzone and  needs  to be  chopped  up into

           smaller pieces for service.This may indicate a performance problem

           due to a bad setup of thatraid/dm device, but  may  also just  be

           part  of normal boundary conditions. dm is notably bad at this and

           will clone lots of i/o.

       A -- remap For stackeddevices, incoming  i/o  is remapped  to  device

           below it in the i/o stack.The remap action details what exactly is

           being remapped to what.

详见:http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html(blktrace user guide)

实例分析

8,16   0       8     0.018543948  8191  Q   W 12989792 + 24 [postgres]

         8,16  0        9     0.018547191  8191 G   W 12989792 + 24 [postgres]

         8,16  0       10    0.018548571  8191  P   N[postgres]

         8,16  0       11     0.018550601  8191 I   W 12989792 + 24 [postgres]

         8,16  0       12     0.018551421  8191 U   N [postgres] 1

         8,16  0       13     0.018552618  8191 D   W 12989792 + 24 [postgres]

         8,16  0       14     0.018638488  8191 C   W 12989792 + 24 [0]

以上就是一次IO请求的生命周期,从actions看到,分别是QGPIUDC

Q:先产生一个该位置的IO意向插入到io队列,此时并没有真正的请求

G:发送一个实际的Io请求给设备

P(plugging):插入:即等待即将到来的更多的io请求进入队列,以便系统能进行IO优化,减少执行IO请求时花的时间

I:将IO请求进行调度,到这一步请求已经完全成型(formed)好了

U (unplugging):拔出,设备决定不再等待其他的IO请求并使得系统必须响应当前IO请求,将该IO请求传给设备驱动器。可以看到,在P和U之间会等待IO,然后进行调度。这里会对IO进行一点优化,

 但是程度很低,因为等待的时间很短,是纳秒级别的

D :发布刚才送入驱动器的IO请求

C:结束IO请求,这里会返回一个执行状态:失败或者成功,在进程号处为0表示执行成功,反之则反

到此一个IO的周期就结束了


利用btt分析blktrace数据


blkparse只是将blktrace数据转成可以人工阅读的格式,由于数据量通常很大,人工分析并不轻松。btt是对blktrace数据进行自动分析的工具。

btt不能分析实时数据,只能对blktrace保存的数据文件进行分析。使用方法:
把原本按CPU分别保存的文件合并成一个,合并后的文件名为sdb.blktrace.bin:
$ blkparse -i sdb -d sdb.blktrace.bin
执行btt对sdb.blktrace.bin进行分析:
$ btt -i sdb.blktrace.bin

下面是一个btt实例:

我们看到69.6173%的时间消耗在D2C,也就是硬件层,这是正常的,我们说过D2C是衡量硬件性能的指标,这里单个IO平均0.396594毫秒,已经是相

当快了,单个IO最慢10.70692毫秒,不算坏。Q2G和G2I都很小,完全正常。I2D稍微有点大,应该是cfq scheduler的调度策略造成的,你可以试试其

它scheduler,比如deadline,比较两者的差异,然后选择最适合你应用特点的那个。


Q2G – 生成IO请求所消耗的时间,包括remap和split的时间;
G2I – IO请求进入IO Scheduler所消耗的时间,包括merge的时间;
I2D – IO请求在IO Scheduler中等待的时间;
D2C – IO请求在driver和硬件上所消耗的时间;
Q2C – 整个IO请求所消耗的时间(Q2I + I2D + D2C = Q2C),相当于iostat的await。

Guess you like

Origin blog.csdn.net/u014645605/article/details/75044952