Disk IO and performance indicators

A concept, disk I / O's

Concept I / O from the word meaning is to be understood that the input and output. Operating system from the top to the bottom, are present I / O between the various levels. For example, the CPU has I / O, memory has I / O, there are the VMM I / O, there are I / O on the bottom disk, which is I / O in a broad sense. Generally, a top of the I / O may occur for a number of disk I / O, that is, the upper I / O is sparse, the lower I / O is dense.

Disk I / O, disk input the name suggests is output. Input means to be written to disk, the output data is read from the disk it refers. The most common types of disk ATA, SATA, FC, SCSI, SAS, as shown in Fig. These types of disks, servers commonly used SAS and FC disks, some high-end SSD disk storage use. Each disk performance is not the same.

Figure 1. Physical disks are common architecture and disk types

Second, the performance evaluation

SAN (Storage Area Network, Storage Area Network) and NAS storage (Network Attached Storage, Network Attached Storage) generally includes two evaluation indexes: IOPS and bandwidth (throughput), two mutually separate but related metrics. The main indicator reflects the storage system performance is IOPS. Next, tell us about the meaning of these two parameters.

IOPS (Input / Output Per Second) i.e. O per second (or write cycles), one of the main disk performance measure. I IOPS refers to a unit of time the system can handle / O request number, I / O requests typically read or write data operation request. Frequent random read and write applications such as OLTP (Online Transaction Processing), IOPS is the key measure. Another important indicator is the throughput (Throughput), refers to the unit time may be the number of successful data transmission. For applications to read and write a lot of the order, such as VOD (Video On Demand), are more concerned about throughput index.

in short:

 

Disk IOPS, which is within a second, how many times the disk I / O to read and write.

Disk throughput, i.e. per disk I / O flow, i.e., plus disk write size of data read out.

Relationship IOPS and throughput

I / O per second throughput = IOPS * average I / O SIZE. It can be seen from the formula: I / O SIZE, the higher the IOPS, the I / O per second of throughput is higher. Therefore, we think that the value IOPS and throughput as high as possible. In fact, for a disk is concerned, these two parameters has its maximum value, and the two parameters there are certain relationship.

IOPS can be broken down into the following indicators:

  1. Toatal IOPS, random and sequential read and write mixed I / O disk IOPS under load, this is most consistent with the actual I / O, most applications Watch this index.
  2. Random Read IOPS, 100% Random Read IOPS under load.
  3. Random Write IOPS, 100% random write IOPS under load conditions.
  4. Sequential Read IOPS, 100% in the order of Read IOPS load.
  5. Sequential Write IOPS, 100% sequential write IOPS under load.

The figure below shows a typical NFS test results:

IOPS benchmark test tools are Iometer, IoZone, FIO, etc., may be used to test integrated disk IOPS under different circumstances. For applications, you need to first determine the load characteristic data, and then select a reasonable indicator IOPS measured and comparative analysis, whereby to select the appropriate storage media and software.

 

IOPS formula

For a complete disk for IO operations is carried out: When the controller sends a command to the disk IO operation when the disk drive arm (Actuator Arm) with read-write head (Head) away from the landing zone (Landing Zone, is located no inner region data), immediately above the track (track) is moved to the initial operation of the data block to be located, this process is called addressing (Seeking), corresponding to the time consumed address period is referred to (Seek time ); however, the corresponding track can not be found immediately read the data, this time until the head disk platter (platter) is rotated to a sector (sector) falls original data block located above the read-write head to positive after start reading , waiting for disk rotation in this process is operable to time spent in the sector called rotational latency (rotational delay); next to the rotation of the disc, the head continues to read / write the corresponding data block, until the completion of all the data required for IO operations, the data transfer process is called (data transfer), referred to as a transmission time corresponding to the time (transfer time). After completing these three steps one IO operation is complete.

When we look at the hard drive manufacturer's leaflet we often see three parameters, namely the average seek time, disk rotation speed and the maximum transmission speed, these three parameters can provide us with the calculation of the three steps time.

A first addressing time, taking into account the data may be read or written in any of a disk track, both may innermost (shortest seek time) in the disk, it may (in the address period outermost disk up), so we only considered in the calculations the average seek time, which is the disk parameters indicated in the average seek time, here on the use of the most current 10krmp hard 5ms.

The second rotation delay, and the address, as when the head is positioned to the track it is possible to read just above the sector at this time is no additional delay amount can be immediately read and write data, but the worst the situation is indeed the disk to rotate full circle after a head to read the data, so we have to consider here is the average rotational latency, for 10krpm disk is (60s / 10k) * (1/2) = 2ms.

The third transfer time, disk parameters offer our maximum transmission speed, of course, to achieve this rate is very difficult, but this speed is pure speed disk read and write disk, so if given a single IO size, we need to know how much disk time spent on data transfer, this time is IO Chunk size / Max transfer Rate.

Now we can come to such a formula to calculate a single IO time.

  IO Time = Seek Time + 60 sec/Rotational Speed/2 + IO Chunk Size/Transfer Rate

So we can calculate IOPS.

  IOPS = 1/IO Time = 1/(Seek Time + 60 sec/Rotational Speed/2 + IO Chunk Size/Transfer Rate)

For a given IO different sizes we can draw the following series of data

  4K (1/7.1 ms = 140 IOPS)
  5ms + (60sec/15000RPM/2) + 4K/40MB = 5 + 2 + 0.1 = 7.1
  8k (1/7.2 ms = 139 IOPS)
  5ms + (60sec/15000RPM/2) + 8K/40MB = 5 + 2 + 0.2 = 7.2
  16K (1/7.4 ms = 135 IOPS)
  5ms + (60sec/15000RPM/2) + 16K/40MB = 5 + 2 + 0.4 = 7.4
  32K (1/7.8 ms = 128 IOPS)
  5ms + (60sec/15000RPM/2) + 32K/40MB = 5 + 2 + 0.8 = 7.8
  64K (1/8.6 ms = 116 IOPS)
  5ms + (60sec/15000RPM/2) + 64K/40MB = 5 + 2 + 1.6 = 8.6

As can be seen from the above data, when a single smaller IO, IO single time it takes the less, the greater the corresponding IOPS.

Above all our data in a more ideal assumption was out, the ideal situation here is to take the average size of the disk seek time and average rotational latency, this assumption is actually more in line with our actual situation in the random read, random write, each time and rotational latency IO address operation can not be ignored, with the presence of these two times will limit the size of IOPS. Now we consider a relatively extreme sequential read and write operations, such as read a lot of storage continuously distributed in the file on disk, because the distribution of stored files is continuous, after the completion of a read head IO operation, does not require the new addressing, the rotation does not need a delay, in which case we can to a large value of IOPS follows.

  4K (1/0.1 ms = 10000 IOPS)
  0ms + 0ms + 4K/40MB = 0.1
  8k (1/0.2 ms = 5000 IOPS)
  0ms + 0ms + 8K/40MB = 0.2
  16K (1/0.4 ms = 2500 IOPS)
  0ms + 0ms + 16K/40MB = 0.4
  32K (1/0.8 ms = 1250 IOPS)
  0ms + 0ms + 32K/40MB = 0.8
  64K (1/1.6 ms = 625 IOPS)
  0ms + 0ms + 64K/40MB = 1.6

It is a very big gap compared to the first set of data, so when we use to measure IOPS IO system of a Department of energy we must make it clear what the situation is in IOPS, which is to show the reader and the size of a single IO mode, of course, in practice, especially in the OLTP system, a small random write IO is the most convincing.

Further, for the same disk (or LUNs), with each I / O read and write data size of the barrier, the value of IOPS is not fixed. For example, each I / O write or read large blocks of data are continuous, this time will be relatively lower IOPS; in the case of lane change infrequently, each read or write small data blocks, IOPS will be relatively higher. That is, also depends on the size of IOPS I / O block, using different values ​​I IOPS / O block size measured are different. For a specific IOPS, you can understand it when testing I / O block size. And IOPS has limits, Table 1 lists a variety of disk IOPS limit.

Table 1. Common types and disk IOPS
 

 

Three, I / O read and write type

大体上讲,I/O 的类型可以分为:读 / 写 I/O、大 / 小块 I/O、连续 / 随机 I/O, 顺序 / 并发 I/O。在这几种类型中,我们主要讨论一下:大 / 小块 I/O、连续 / 随机 I/O, 顺序 / 并发 I/O。

大 / 小块 I/O

这个数值指的是控制器指令中给出的连续读出扇区数目的多少。如果数目较多,如 64,128 等,我们可以认为是大块 I/O;反之,如果很小,比如 4,8,我们就会认为是小块 I/O,实际上,在大块和小块 I/O 之间,没有明确的界限。

连续 / 随机 I/O

连续 I/O 指的是本次 I/O 给出的初始扇区地址和上一次 I/O 的结束扇区地址是完全连续或者相隔不多的。反之,如果相差很大,则算作一次随机 I/O

连续 I/O 比随机 I/O 效率高的原因是:在做连续 I/O 的时候,磁头几乎不用换道,或者换道的时间很短;而对于随机 I/O,如果这个 I/O 很多的话,会导致磁头不停地换道,造成效率的极大降低。

顺序 / 并发 I/O

从概念上讲,并发 I/O 就是指向一块磁盘发出一条 I/O 指令后,不必等待它回应,接着向另外一块磁盘发 I/O 指令。对于具有条带性的 RAID(LUN),对其进行的 I/O 操作是并发的,例如:raid 0+1(1+0),raid5 等。反之则为顺序 I/O。

 

四、磁盘 I/O 性能的监控

监控磁盘的 I/O 性能,我们可以使用 AIX 的系统命令,例如:sar -d, iostat, topas, nmon 等。下面,我将以 nmon 和 topas 为例,讲述在系统中如何观察磁盘 I/O 的性能。

topas

登录 AIX 操作系统,输入 topas,然后按 D,会出现如下界面:

 

在上图中,TPS 即为磁盘的 IOPS,KBPS 即为磁盘每秒的吞吐量。由于服务器处于空闲的状态,我们可以看到 IOPS,KBPS 的数据都非常低。

我们使用 dd if 命令向磁盘 hdisk2 发读 I/O,block 大小为 1MB:

 

利用 topas 进行监控:

 

此时,hdisk2 的吞吐量为 163.9M,IOPS 为 655。

我们再启动一个 dd if,使 hdisk 的 busy 数值达到 100%:


从上图可以看出,在磁盘 busy 达到 100% 的时候,其吞吐量为 304.1M,IOPS 为 1200。

hdisk2 是本地集成的 SAS 盘,我们可以查出本地集成 SAS 通道的带宽为 3Gb:

对于 3Gb 的 SAS 通道,304.1M 的磁盘吞吐量已经接近其 I/O 带宽的峰值了。

需要指出的是,使用 dd if 测量磁盘的带宽是可行的,但是由此来确定业务 I/O 的 IOPS 和吞吐量是不科学的。因为,dd if 所发起的读写仅为顺序 I/O 读写,在 OLTP 的业务中,这种读写是不常见的,而是随机小 I/O 比较多,因此,测量业务的磁盘 I/O 性能,需要在运行业务的时候进行监控。

nmon

在系统中输入 nmon,按 d,可以得到如下界面 :

Figure xxx. Requires a heading

可以得到此时磁盘 hdisk2 吞吐量为 318M。

使用 nmon 收集一个时间段的数据,然后使用 nmon analyzer 进行分析,可以得出更为直接的图表:

将收集好的 nmon 文件使用 nmon analyzer 进行分析,得出如下报表:

图 2.nmon 图表显示磁盘性能

五、磁盘 I/O 性能调优

确认磁盘 I/O 存在性能问题

对于随机负载,当遇到余下情况时,我们那通常认为存在 I/O 性能问题:

1. 平均读时间大于 15ms

2. 在具有写 cache 的条件下,平均写时间大于 2.5ms

对于顺序负载,当遇到余下情况时,我们那通常认为存在 I/O 性能问题:

1. 在一个磁盘上有两个连续的 I/O 流

2. 吞吐量不足(即远远小于磁盘 I/O 带宽)

对于一块磁盘来讲,随着 IOPS 数量的增加,I/O service 也会增加,并且会有一个饱和点,即 IOPS 达到某个点以后,IOPS 再增加将会引起 I/O service time 的显著增加。

图 3. 磁盘 IOPS 与 IO service time 关系图

从经验上讲,我们在测试工作中,我们主要关注 IOPS 和吞吐量以及磁盘的 busy% 这三个数值。如果 IOPS 和吞吐量均很低,磁盘的 busy% 也很低,我们会认为磁盘压力过小,造成吞吐量和 IOPS 过低;只有在 IOPS 和吞吐量均很低,磁盘的 busy% 很高(接近 100%)的时候,我们才会从磁盘 I/O 方面分析 I/O 性能

磁盘性能指标

五个常见指标: 使用率、饱和度、IOPS、吞吐量以及响应时间。这五个指标,是衡量磁盘性能的基本指标。
• 使用率,是指磁盘处理 I/O 的时间百分比。过高的使用率(比如超过 80%),通常意味着磁盘 I/O 存在性能瓶颈。
• 饱和度,是指磁盘处理 I/O 的繁忙程度。过高的饱和度,意味着磁盘存在严重的性能瓶颈。当饱和度为 100% 时,磁盘无法接受新的 I/O 请求。
• IOPS(Input/Output Per Second),是指每秒的 I/O 请求数。
• 吞吐量,是指每秒的 I/O 请求大小。
• 响应时间,是指 I/O 请求从发出到收到响应的间隔时间。

这些指标,我们在看的时候, 不要孤立地去比较某一指标,而要结合读写比例、I/O 类型(随机还是连续)以及 I/O 的大小,综合来分析。
举个例子,在数据库、大量小文件等这类随机读写比较多的场景中,IOPS 更能反映系统的整体性能;而在多媒体等顺序读写较多的场景中,吞吐量更能反映系统的整体性能。

观测磁盘的I/O性能指标: iostat
iostat 是最常用的磁盘 I/O 性能观测工具,它提供了每个磁盘的使用率、IOPS、吞吐量等各种常见的性能指标,当然,这些数据实际上来自 /proc/diskstats。
[root@host1 ~]# iostat -d -x 1 #-d -x 1 展示所有的磁盘I/O指标, 每1秒输出一组数据
磁盘性能指标
这些指标中,重点注意:
• %util ,就是我们前面提到的磁盘 I/O 使用率;
• r/s+ w/s ,就是 IOPS;
• rkB/s+wkB/s ,就是吞吐量;
• r_await+w_await ,就是响应时间。

关于更多的显示选项, 可以在iostat的man手册中查询

观测进程的I/O性能指标: pidstat
磁盘性能指标
输出中每项的含义:
• 每秒读取的数据大小(kB_rd/s),单位是 KB;
• 每秒发出的写请求数据大小(kB_wr/s),单位是 KB;
• 每秒取消的写请求数据大小(kB_ccwr/s),单位是 KB。

根据 I/O 大小对进程排序: iotop
[root@host1 ~]# iotop
磁盘性能指标
前两行分别表示,进程的磁盘读写大小总数和磁盘真实的读写大小总数。因为缓存、缓冲区、I/O 合并等因素的影响,它们可能并不相等。
剩下的部分,则是从各个角度来分别表示进程的 I/O 情况,包括线程 ID、I/O 优先级、每秒读磁盘的大小、每秒写磁盘的大小、换入和等待 I/O 的时钟百分比等。

 

参考资料:https://blog.51cto.com/14113311/2376144

https://www.cnblogs.com/sddai/p/8647795.html

Guess you like

Origin www.cnblogs.com/wx170119/p/11427837.html