Influxdb disk io performance

1. Background

The monitoring service application influxdb needs to check the disk IO performance before going online to prevent low disk IO performance from affecting the writing delay of monitoring data.
The following takes the influxdb service as an example, and the data is placed in the /zpaasssd directory.

2. Performance check


2.1. Confirm the disk partition where the influxdb application mounting node is located.
For example, by default, the influxdb disk data is placed in the /zpaasssd/ directory.
 

[zoms@172 supervisor]$ df -h /zpaasssd/
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/vgssd-lvzpaasssd  160G   36G  125G  23% /zpaasssd

You can confirm that the disk partition where the directory is located is/dev/mapper/vgssd-lvzpaasssd

2.2 Check the io performance of each partition

[zoms@172 supervisor]$  sar -bdp 2 4|grep zpaasssd  # 关键词zpaasssd
Linux 4.4.65-1.el7.elrepo.x86_64 (172.16.24.70)         10/24/2022      _x86_64_        (16 CPU)
11:19:48 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
11:21:47 AM vgssd-lvzpaasssd      1.00      0.00      5.00      5.00      0.00      0.00      0.00      0.00
11:21:49 AM vgssd-lvzpaasssd     48.00      0.00  30483.00    635.06      0.08      1.66      0.32      1.55
11:21:51 AM vgssd-lvzpaasssd     22.50      0.00   2310.00    102.67      0.00      0.18      0.13      0.30
11:21:53 AM vgssd-lvzpaasssd     52.50      0.00  37290.00    710.29      0.08      1.61      0.22      1.15
Average:    vgssd-lvzpaasssd     31.00      0.00  17522.00    565.23      0.04      1.35      0.24      0.75

3. The relationship between disk I/O, CPU and memory

[zoms@172 supervisor]$ iostat -c
Linux 4.4.65-1.el7.elrepo.x86_64 (172.16.24.70)         10/24/2022      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.11    0.00    1.97    0.11    0.00   83.80

Generally, we focus on %iowait and %idle, which respectively represent the percentage of time the CPU waits for IO completion and the percentage of CPU idle time.

If %iowait is high, it indicates that there is an IO bottleneck on the disk; if %idle is high, the CPU is relatively idle;

If both values ​​are relatively high, it is possible that the CPU is waiting for memory allocation and the bottleneck is in the memory. In this case, the memory should be increased;

If %idle is low, the bottleneck is the CPU and CPU resources should be increased.

4. Performance judgment
 

对于磁盘 IO 性能,一般有如下评判标准:

await,通俗理解就想我们去医院看病排队等待的时间,这个值和医生的服务速度(svctm)和你前面排队的人数(avgqu-sz)有关。一般地系统IO响应时间应该低于5ms,如果大于10ms就比较大了。  await 值的大小一般取决与 svctm 的值和 I/O 队列长度以 及I/O 请求模式,如果svctm的值与await很接近,表示几乎没有I/O等待,磁盘性能很好,如果await的值远高于svctm的值,则表示I/O队列等待太长,系统上运行的应用程序将变慢,此时可以通过更换更快的硬盘或者升级CPU来解决问题。

%util:%util 项的值也是衡量磁盘 I/O 的一个重要指标,衡量IO的繁忙程度,这个值越大,说明产生的IO请求较多,IO压力较大,我们可以结合%idle参数来看,如果 %idle < 70% 就说明 IO 比较繁忙了。

如果 %util 接近 100% ,表示磁盘产生的 I/O 请求太多,I/O 系统已经满负荷的在工作,该磁盘可能存在瓶颈。长期下去,势必影响系统的性能,可以通过优化程序或者通过更换更高、更快的磁盘来解决此问题。(当然如果是多磁盘,即使%util是100%,因为磁盘的并发能力,所以磁盘使用未必就到了瓶颈)

Guess you like

Origin blog.csdn.net/WXF_Sir/article/details/130561071