How to analyze and optimize various storage performance bottlenecks?

[Abstract] This article combines practice to analyze the architecture and operating principles of the storage system, in-depth analysis of various storage performance bottleneck scenarios, and proposes corresponding performance optimization methods, hoping to have certain reference value for peers.

[Author] Chen Pingchun, currently working in the insurance industry, has many years of experience in operation and maintenance of systems, storage, and data backup.

Preface

Reliability, security and performance are the three most important evaluation dimensions of IT systems. For IT systems, reliability and security are the foundation, and the destructive effects caused by system failures or data leaks are obvious; while performance is the core capability, representing the service level of the IT system. Performance bottlenecks will restrict the development of enterprise business. development, seriously affecting user experience.

Storage systems are an important part of enterprise IT infrastructure, providing data storage services for numerous IT systems within the enterprise. With the deepening of digital transformation, the construction of enterprise IT systems has also been further accelerated. On the one hand, this has brought about a sharp increase in the amount of data. On the other hand, it has also increased the frequency of data access. The impact of storage performance bottlenecks will also be further amplified. . This article will combine personal operation and maintenance practice to analyze the architecture and operating principles of the storage system, conduct an in-depth analysis of various storage performance bottleneck scenarios, and propose corresponding performance optimization methods. I hope it will be of certain reference value to peers.

1. Storage system overview

Understanding the architecture and operating principles of the storage system is an introductory course for performance analysis and optimization, so that you can analyze and solve storage performance problems globally. After years of technological evolution and architectural changes, storage systems can be roughly divided into three categories: SAN storage, NAS storage and distributed storage. They have similarities but also have their own characteristics. The following will analyze these three types of storage architectures and their operating principles in detail.

1.1 SAN storage

SAN (Storage Area Network) itself is a storage network responsible for data storage tasks and is isolated from the business LAN network. SAN storage is a storage block-based storage system that generally uses communication protocols such as FC, ISCSI, and NVMe.

From an architectural perspective, SAN storage generally mounts a disk array on the back end of the storage controller. Data is ultimately stored in the disk array, and the disk array includes multiple RAID groups. N disks form a RAID group, and logical storage units LUNs will be divided on top of the RAID group, which are logical disks of the shared storage pool. These LUNs will be connected to the HBA card of the server through the SAN network, and will be used by the server. The operating system recognizes the disk as a disk and is partitioned and formatted for use. Its architecture is shown in Figure 1.

picture


Figure 1. SAN storage architecture diagram

From the perspective of storage data IO flow, taking commonly used FC-SAN storage as an example, the server operating system generally uses a file system to manage files. The file system is built on the storage LUN, and the reading and writing of files will correspond to the storage. IO operation; the file will be divided into multiple Blocks, and the Block size is fixed, usually 4KB-16KB; the data Block block will be sent to the HBA card of the server, and the HBA card will convert it into a data frame (Data frame) of the FC protocol. , and transmitted to the front port of the storage system through the SAN network; the front port of the storage continues to repackage these data frames into data blocks. The block size is generally 4KB, and transmits these data block blocks to the storage controller; the storage controller There will be a storage cache (Cache), which is divided into read cache and write cache. According to the cache algorithm rules, some IO data flows that hit the cache will immediately return IO confirmation, and IO data flows that miss the cache will need to continue to access the disk array. ; Since multiple disks form a RAID group, a data IO stream actually corresponds to concurrent reading and writing of multiple disks. The whole process is shown in Figure 2:

picture


Figure 2. Data IO flow diagram of FC-SAN storage

1.2 NAS storage

NAS (Network Attached Storage) storage can generally be considered as network file storage. Most user data exists in the form of files. It adopts the NFS/CIFS protocol through the Ethernet access mode, providing wide compatibility and ease of use sharing capabilities. Compared with SAN storage, NAS storage does not provide storage services in the form of disks, does not require partitioning and formatting, and can directly provide network file systems that can be directly mounted.

From an architectural perspective, NAS storage is generally implemented based on disk arrays (there are also implementations based on cluster file systems or distributed storage). There will be a NAS head on the disk array to create and manage the file system; the NAS head is NAS The core logical component of storage is a typical C/S architecture style, which is the server side that provides network file services to the outside world; other clients, after obtaining authorization, can mount file systems, map network disks, or use HTTP, FTP, etc. Files on the NAS file system can be shared and accessed. The architecture is shown in Figure 3:

picture


Figure 3.NAS storage architecture diagram

From the perspective of storage data IO flow, taking NFS as an example, NAS storage has characteristics that are obviously different from SAN storage, such as client caching and server statelessness. First of all, the client does not directly access the NAS file system, but the client cache. The file system directory tree of the server is mapped to the client. In fact, when reading and writing files, it needs to read and write fixed-size pages in a loop, such as 64KB; while Server The statelessness of the client is reflected in the fact that there is no need to maintain the protocol status information of the client. The client operates the file system data of the server through RPC calls, but it cannot obtain the status of the server. When the connection is interrupted, it can continue to connect and retry. As shown in Figure 4, the NAS storage data IO flow based on the TCP application layer protocol will be more flexible and adaptable, but the data IO path will be longer, data consistency will be poor, and there will be security issues such as data leakage. Data transmission efficiency is also not high.

picture


Figure 4. NAS storage data IO flow diagram under nfs protocol

1.3 Distributed storage

The distributed storage system adopts a scalable cluster architecture and uses a data copy algorithm to disperse and store data on multiple independent devices. Distributed clusters are generally connected through a universal TCP/IP network. Compared with its traditional centralized storage array, the distributed storage system can share the storage load through multiple storage servers, which can meet the needs of large-scale storage applications. Common forms of distributed storage systems include distributed file systems (such as HDFS) and object storage (such as Ceph).

From an architectural perspective, compared with centralized storage systems, the deployment architecture of distributed storage systems is relatively simple, generally using a common server network interconnection method, but its logical architecture is more complex. The core design idea of ​​the distributed storage system is decentralization. The difficulty of decentralization mainly lies in the decentralization of the main control node. The architecture design idea of ​​an architecture with a main control node, such as HDFS, is map-reduce, which reduces large to small. , divide and conquer, and then merge processing, the architecture requires master control nodes for coordination, but the load of the master control nodes is distributed to the data nodes, and data copies are stored on the data nodes, and each data copy is distributed in three different On the data node, as shown in Figure 5; the biggest advantage of decentralization is to solve the bottleneck of the master node itself, and its architectural design idea is a balanced design. This architecture only has data nodes, but it needs to abstract more logic Functional components are evenly distributed on different nodes. Take the use of Ceph block storage as an example. In addition to cluster management and monitoring components such as Mon, the OSD component in Ceph is used to manage physical disks. PG is built based on OSD. Data objects are stored on PG, and the data objects correspond to Ceph. Block device, Ceph block device can be formatted and partitioned so that it can be used by applications. The architecture diagram is shown in Figure 6.


Figure 5. Distributed storage architecture with master node

picture

Figure 6. Ceph storage architecture without master node

        From the perspective of storage IO data flow, unlike centralized storage with fewer data channels, distributed storage can have more and wider data entrances, but there are also more data flows within the cluster. Taking Ceph's block storage as an example, the file system accessed by the client application corresponds to the Ceph block device. Block data accesses the Ceph cluster RBD service through the network, which ultimately corresponds to the disk reading and writing of the three-copy OSD. The process is shown in Figure 7. . For a three-copy distributed storage system, in order to ensure strong consistency of data, one write IO generally requires that the master copy and the other two slave copies have been written before the completion of the write can be finally confirmed.


Figure 7. Ceph storage IO data flow diagram

2. Storage performance analysis

Storage performance analysis is the basis for performance optimization. Although there are many types of storage systems with various design solutions, performance analysis methods have certain universality. Storage performance analysis methods can be divided into qualitative and quantitative methods. Usually in the early stages of contact understanding and technology selection, the conditions for quantitative analysis may not be available, so qualitative analysis methods are mainly used to evaluate the performance of the storage system; and once the POC is entered During testing, system operation and maintenance, etc., quantitative analysis should be the main focus, and storage performance bottlenecks should be determined through actual performance index data.

2.1 Qualitative analysis

Qualitative analysis is based on personal operation and maintenance experience to analyze whether the performance of the storage system can meet the needs of the application system and whether there are performance bottlenecks in the storage system, and these all depend on the familiarity with the application data type and storage system.

2.1.1 Application data IO analysis

Understanding the types of application data IO is the basis for storage performance analysis. There are differences in the IO access of different application data, mainly reflected in the IO size, sequential or random reading and writing, read and write ratio, etc., as shown in Table 1.

App types

IO size

Reading and writing ratio

Random or sequential reading and writing

General documents

Small

Large proportion of reading

Mainly random reading and writing

log file

Small

Write in large proportion

Sequential read and write

video streaming

big

Large proportion of reading

Mainly sequential read and write

operating system

Small

Large proportion of reading

Most of them read and write sequentially

data backup

big

Write in large proportion

Sequential read and write

OLTP database

Small

About 70% read/30% write

Mainly random reading and writing

OLAP database

big

Large proportion of reading

Mainly sequential read and write

Table 1. Data IO types for applications

IO size

Differences in application data types will result in data files of different sizes, which also correspond to different data IO sizes. Assuming that the IO processing capacity of the storage system is fixed, it is obvious that large IO processes more data per unit time, so merging small IO will be more efficient; and assuming that the storage system has an upper limit on the size of data IO processed each time, then large IO is processed each time It needs to be split before, obviously the IO processing efficiency will decrease. For example, SAN storage has high IO processing capabilities, but the IO processed at a time is relatively small, so it is more suitable for small IO application systems with high performance requirements. However, when processing large IO application data, the efficiency will decrease.

Reading and writing ratio

The read-write ratio is one of the important characteristics of application data, and there is a big difference between IO read and write operations. Generally speaking, write operations consume greater storage performance, have higher write IO processing capabilities and delays, and have larger differences in cache requirements. For distributed storage, the multi-copy mechanism can optimize read operations, but it is not conducive to write operations. The write confirmation path is long, so it is necessary to optimize the data transmission path and configure more write caches, which is more suitable for applications with a high read ratio. system.

Sequential or random read and write

The difference between sequential or random reading and writing is mainly reflected in disk media characteristics, pre-reading mechanism, cache hit rate, etc. For mechanical hard disks, sequential read and write IO can reduce disk seek time, while random read and write IO will lengthen the response time. By improving the cache hit rate, the data in the cache can be converted into sequential read and write to the disk. ; SSD hard drives do not have mechanical seeking, and their random read and write capabilities are much better than mechanical hard drives.

2.1.2 Performance bottleneck analysis

The key to storage performance analysis is to analyze performance bottlenecks, which includes two aspects: one is the factors that trigger performance bottlenecks; the other is the location of performance bottlenecks and the location of storage IO congestion.

1) Factors that trigger performance bottlenecks

Storage hotspots: Storage hotspots are defects in planning and design. Typical scenarios include data IO load being too concentrated on a certain storage node, port, disk, etc., storage resource contention, lock competition, software and hardware parameter limitations, etc.

Performance spikes: Common in scenarios where data IO concurrency is high and performance requirements are released in a short period of time. Performance spikes will fully expose existing hot issues and trigger storage performance bottlenecks. Typical scenarios include virtual desktop startup storms, flash sale services, etc.

Service capability degradation: Commonly seen in fault scenarios, storage service capability degradation combined with busy data IO phases will trigger storage performance bottlenecks. Typical failure scenarios include SAN storage single storage controller failure, disk rebuild, etc. Distributed storage is more prone to performance jitters, mainly due to a node or disk going offline or rebuilding a data copy or a data copy responding slowly; customers The CPU and memory resources of the end server are insufficient, etc.

2) Positioning of performance bottlenecks

The location of storage performance bottlenecks needs to be analyzed in conjunction with the architecture of the storage system. According to the composition of the storage system, it can be roughly divided into the following categories of performance bottleneck locations:

Data transmission network: Storage external and internal data transmission network bandwidth, port rate, transmission protocol, and load balancing of the transmission path

Storage controller: CPU processing power of the controller

Cache: Mainly divided into client cache and storage cache, including cache size, cache hit rate, and allocation ratio of read and write caches

Disk: Mainly divided into mechanical hard disk, flash disk and other disk media, including disk speed, single disk read and write IOPS, disk capacity, number of disks, disk redundancy (RAID, copy or erasure coding) algorithm

Client: Reflected in the client's CPU, memory and other resource usage, other applications' occupation of storage resources and other external environment impacts

2.2 Quantitative analysis

Quantitative analysis analyzes and solves problems from the perspective of data indicators. It can not only measure the service capabilities of the storage system from the storage side, but also measure the storage IO experience from the user application side. Generally speaking, quantitative analysis on the storage side excludes the impact of the storage network and clients. Performance data can indicate whether there are performance bottlenecks in the storage system itself and can be used for performance monitoring of the storage system; while quantitative analysis on the user application side is mainly used for some performance Test scenarios, through benchmarking tools, can form a performance baseline for the current system environment.

2.2.1 Three major performance indicators

Whether it is quantitative analysis on the storage side or the user application side, it is inseparable from three major storage performance data indicators: IOPS, throughput (Throughput), and latency (Latency). Therefore, it is necessary to clarify the meaning and correlation of the three performance data indicators.

IOPS: represents the number of IO operations processed by the storage per second. For storage systems, when performing performance analysis, we not only need to pay attention to the overall IOPS, but sometimes we also need to analyze the IOPS of a single controller, a single LUN or a single disk, and we may even need to distinguish between read or write IOPS.

Throughput: represents the amount of IO data processed by storage per second, which is the bandwidth occupied by storage data transmission. Similar to IOPS, it can also be broken down into reading or writing, and can be analyzed by individual components.

Latency: represents the time required for the storage system to process IO operations. Usually, it is the most important storage performance indicator. Similar to IOPS, it can also be broken down into reading or writing, and can be analyzed by individual components.

In the analysis of the three major performance indicators, it is more scientific to use throughput to evaluate performance for large IO applications; while for small IO applications, such as databases, it is necessary to evaluate performance through IOPS and latency indicators. High IOPS and low latency can be measured simultaneously. Only when these requirements are met can high-concurrency and fast database access be handled.

2.2.2 Performance test analysis

Storage performance testing can better understand the performance indicators of storage. Take a storage performance test as an example. The storage stress testing tool vdbench (can be used for stress testing both bare disk and file access methods). The test background is that the storage is allocated 5 LUNs are given to the host. The host performs random read and write tests on these five bare disks. 80% of them are read and 20 are written. The IO size is gradually adjusted for testing. The three major performance data indicators are as follows:

IO size

IOPS

Throughput (MB/s)

Delay (ms)

4KB

89288

348.78

0.411

16KB

75209

1175.15

0.488

32KB

59415

1856.72

0.617

64KB

36612

2288.30

1.005

128KB

20686

2585.82

1.833

Table 2. Storage performance test data

The conclusions of this storage performance test are as follows:

1) The storage controller CPU usage peaks at 20%-45%, indicating that the storage controller can also withstand higher IO load, as shown in Figure 8.

picture


Figure 8. Storage controller CPU usage

2) This test did not reach the system performance bottleneck of the host, and the CPU usage was less than 20%. This is also important in storage performance analysis.

picture


Figure 9. Host system CPU usage

3) Storage performance baseline: The test data in Table 2 is the performance baseline data of the 5 luns used by a specific host under different IO loads. During the actual operation, the IO and read and write IO sizes of other applications are taken into account. Due to uneven factors, the general IOPS peak is 50% of the baseline value.

4) Throughput and IOPS: Throughput = IOPS*IO size. In the same business scenario, the general IO size will not change significantly. Then the throughput under the limit test will be directly proportional to the IOPS, but the throughput is limited by the network. Bandwidth and IOPS are limited by the processing capabilities of the storage lun;

5) Latency and IOPS: It can be seen that there is an inverse relationship between latency and IOPS in the test data, that is, the lower the IOPS, the higher the latency. This is because the storage load pressure is different under different IO size test scenarios. The same, that is, in the case of large IO, the storage load becomes larger, the IOPS decreases, and the delay increases. The relationship between IOPS and latency under normal operating conditions of the storage system is shown in Figure 10. In most cases, the load pressure on the storage becomes greater, the IOPS increases, and the latency begins to grow. Once the latency is too high, it will affect the business system. performance. Therefore, in most cases, latency is the most important storage performance indicator. Generally, for business systems with higher performance requirements, the storage latency needs to be less than 5ms.

picture


Figure 10. Storage IOPS and latency under normal operating conditions

3. Storage performance optimization

Storage performance analysis and optimization is a long-term, complex and important task. It is necessary to clarify the storage performance optimization goals, conduct detailed performance analysis, and formulate phased optimization plans and verification plans to ensure the continuous development of storage performance optimization work.

3.1 Optimization strategy

Storage performance optimization work has a certain strategic nature. Only scientific optimization strategies can guide the formulation of more reasonable storage performance optimization plans.

1) Comprehensive consideration: Storage performance is a global issue. It is necessary to comprehensively consider the performance bottlenecks on the IO path and analyze possible chain reactions in the performance optimization solution to improve the correctness of performance optimization decisions.

2) Cost-effectiveness of optimization: Set reasonable performance optimization goals. When selecting multiple performance optimization solutions, you must comprehensively consider the solution cost, implementation complexity, benefits, etc.

3) Planning is more important: Compared with the cost of optimization and transformation caused by storage performance optimization, it is more important to make reasonable planning in advance. For example, storage selection that takes into account business performance requirements, baseline data and performance capacity management of storage performance testing before the system goes online, storage expansion should pay attention to performance capacity indicators (evaluate whether there is a major change in storage IOPS/GB after expansion), and Balanced distribution of storage performance loads, etc.

4) Improve performance monitoring: End-to-end storage performance is also very important. Monitor the entire data IO path and analyze actual running performance data based on the storage performance baseline, so as to discover storage performance bottlenecks in a timely manner and verify storage optimization. Results.

3.2 Optimization plan

Storage performance optimization solutions can be roughly divided into the following categories:

1) Hardware upgrade

The IOPS of a single mechanical hard disk is around 100, and the latency is more than 5ms, while the IOPS of a single SSD is more than 10,000, and the latency is less than 1ms. When traditional mechanical hard disks are replaced with all-flash storage, performance can be greatly improved; NVMe The application of technologies such as RDMA and RDMA optimizes the underlying communication framework, which can greatly improve data transmission efficiency and reduce storage latency; horizontal or vertical expansion of storage control nodes can effectively increase storage load capacity; client hardware upgrades can also eliminate the need for Performance bottlenecks caused by the client's CPU, memory, network, etc.

Hardware upgrade is a very effective means of optimizing storage performance, but in many cases it requires relatively high hardware costs, and the input-output ratio needs to be carefully evaluated.

2) Upper-layer application optimization

Upper-layer application optimization methods are also relatively rich. The main goal is to reduce the IO load brought to storage by upper-layer applications, such as enabling deduplication or data compression before data transmission; optimizing IO concurrency and aggregating a large number of small IOs into large IOs; database indexing Optimization, SQL statement optimization.

3) Adjust performance load

Adjusting performance load mainly targets hot storage performance issues. Solutions include optimizing disk distribution and adjusting disk load; adjusting storage network port load; adjusting storage controller load; adding storage and adjusting part of the load to new storage.

4) Data cache optimization

Data cache is a very important performance module in the storage system. Generally, cache uses faster storage media such as memory or flash memory, which is much faster than ordinary disks. Many storage performance problems originate from caching and are resolved by caching optimization. Data cache is divided into client local cache and storage cache. For example, the client's local cache is very important for some distributed file systems. Increasing the cache size can effectively improve the cache hit rate; the storage cache is also extremely important. Multi-level data caching technology can store hot data on faster storage media. , reduce storage latency.

Guess you like

Origin blog.csdn.net/iamonlyme/article/details/133012061