What happens when you meet a "storage technology leap" in a database system?

  • At the beginning of last month, I saw an article about computing storage performance test in Percona's blog (see the link at the end of the article for details). Some of the features mentioned in it aroused my interest, so I expanded on computing Storage-related technologies suddenly discovered that computing and storage for database systems may be able to more or less solve some bottlenecks and pain points, and even significantly reduce TCO without affecting performance. What kind of characteristics have such magical power?

  • Here is a key point. We will talk about the content mentioned in the article later. Let's take a look at the bottlenecks and pain points that may be encountered in the life cycle management of the database system. Then, I will introduce how computing storage can systematically resolve these bottlenecks and pain points.

  • PS: The following content only represents personal views. In addition, since I am familiar with MySQL, the following will briefly list a few typical pain points with the MySQL InnoDB engine as an example.

1. What are the typical bottlenecks and pain points in database systems?

  • Two key indicators of database performance: (latency) and the number of transactions in parallel (tps). The two complement each other and are inversely proportional. The lower the transaction's latency, the higher the allowed tps. Conversely, the higher the transaction's latency, then The lower the allowed tps. The higher the tps, the better the performance, and vice versa, the lower the performance. The database is very sensitive to the response delay of IO, which directly affects the response delay of the transaction, and the response delay of the transaction largely determines the tps of the database. Therefore, in a scenario where the MySQL database is run on a server with reasonable hardware specifications and the MySQL index is relatively standardized, we can often see that the first bottleneck is the IO subsystem

  • Focusing on these two key indicators, I have listed four typical scenarios where bottlenecks and pain points may occur, as follows

1.1. The storage capacity of a single database server is insufficient

  • Insufficient storage capacity

  • Traditional solution 

        * When time is pressed, you can delete files frequently to free up space to temporarily solve the problem 

        * When the budget is sufficient, a larger capacity storage device can be replaced for full data migration 

        * When budget and time are sufficient, more servers can be added for data splitting

  • Storage load is too high (throughput is too high)

  • Traditional solutions: 

        * When time is tight, it can be temporarily solved by killing the process that consumes the most storage throughput 

        * When the budget is sufficient, replace storage devices with higher throughput bandwidth and perform full data migration 

        * When budget and time are sufficient, more servers can be added for data splitting

  • Disadvantages:

  • Temporary solutions require frequent attention to storage load conditions, and often neglect one and the other

  • Replacing parts requires additional costs. Data splitting increases business complexity and maintenance costs, and also introduces some new problems (see "1.4. Excessively high number of concurrent queries leads to high database instance load" The disadvantages mentioned in)

1.2. The database server has insufficient memory

  • Traditional solutions:

  • Temporarily clean up unnecessary table data or reduce the parameter values ​​of MySQL in various cache allocations to free up more memory to enable MySQL Server to do more things

  • Increase physical memory and increase the value of various buffer allocation parameters of MySQL

  • Disadvantages:

  • Temporary solutions require continuous attention to memory usage and frequent operations. Moreover, this is the practice of digging the east wall to supplement the west wall

  • Increasing physical memory, in addition to increasing costs, will also have a certain impact on the business (the server needs to be shut down)

1.3. A single transaction is too large, resulting in poor query performance

  • Traditional solution: split large transactions into small transactions

  • For large transactions that cannot be split, under the premise of the same hardware specifications, read and write transactions can be optimized separately. For example: Write can change the binlog format to statement at the session level before execution to reduce the amount of binlog transmitted between the master and slave instances; read transactions can be split into read-only slave libraries to reduce the access pressure of the master library

  • Disadvantages:

  • Splitting a large transaction into small transactions does not reduce the number of work tasks that need to be completed in the original large transaction, but after splitting into small transactions, it reduces the impact on other parallel transactions (for example: large transactions may take a long time Holding locks, binary log file handle resources, etc., resulting in long-term blocking of other parallel transactions, causing parallel transaction execution failure)

1.4. Excessive number of concurrent queries results in excessive load on the database instance

  • Traditional solutions:

  • Kill high-load query sessions and subsequently optimize slow queries

  • Read and write separation, and increase read-only slave library, expand read-only capability

  • Data splitting, distributing data to multiple database instances, expanding read/write capabilities. 

        * To split the data of the large table, first do the vertical split (by business split, split the fields of different businesses into different tables, or different databases, or even different instances), and then do horizontal split ( For tables that cannot continue to split fields, if the amount of data is still large enough to affect performance, you may need to continue to split the large table with the standard of no more than 1000W rows of data, which is what we often call data fragmentation)

  • Disadvantages: Both vertical and horizontal splits need to be applied with corresponding transformations. Moreover, after data splits, new pain points will be introduced, similar to the following (although these pain points can be solved through technological transformation, the cost is too high , And it takes a long time to run-in to make it stable. In addition, it may need to be in-depth with the business. Different customers may need to do different transformations): cross-shard access, resulting in having to enable distributed transactions to To ensure the data consistency of cross-shard access, and the distributed transaction itself has a certain amount of engineering to implement, and the application itself also needs to be modified

  • If the shards span different instances, global consistent backup of data cannot be achieved. To achieve global data consistent backup of multiple data shards across instances, some transformations of middleware and databases are required.

  • DDL statements that are not under transaction control cannot ensure the global consistency of data through distributed transactions. Therefore, additional mechanisms are needed to ensure the global consistency of data.

  • If the sharded data is skewed, or the access load is skewed, it may also be necessary to frequently migrate the sharded data (migrate shards with large data volumes, shards in high-load instances to relatively idle instances)

2. How does computing storage solve database bottlenecks and pain points?

  • For computational storage, I will list three features that I think are more important. First, I will briefly introduce their principles (for detailed introduction of related principles, please refer to the link at the end of the article), and then talk about how these features solve the above-mentioned database. Bottlenecks and pain points

  • The first important feature: storage supports hardware-level atomic writes

  • Why does the database need atomic write? 

        * The default Page Size of InnoDB data file is 16k, and the default block size of file system is 4k, that is, the minimum IO operation unit of InnoDB data file is 16k, and the minimum IO operation unit of file system is 4k. When a 16k page is sent to the file system, the file system needs to decompose it into 4 4k blocks, and then write them to the storage device. Since most file systems do not support atomic writes, if an accident (such as a power failure) occurs during the file system writing to the storage device, it may cause InnoDB's Page Size to be partially written (corrupted), thereby causing MySQL Server to fail Normal start 

        * In order to avoid this problem, InnoDB introduced the doublewrite feature. What is doublewrite used for? When there is data that needs to be written to the data file (that is, flushing), first write to doublewrite (before MySQL 8.0.20, doublewrite is located in the shared tablespace ibdata1. Starting from version 8.0.20, it uses independent file storage and supports multiple Files, but the maximum number of files is twice that of the buffer pool instance, that is, each buffer pool instance has 2 doublewrite files), each time 1MB is written continuously, and the data page is written to after the doublewrite succeeds, then the data page is written In this way, if an accident causes the data page to be damaged, during the database Crash Recovery, it will try to find the damaged page from the doublewrite to overwrite and repair it. After the repair, the MySQL Server can be started normally (Note : Although Redo can support data recovery, it records the incremental modification content of the data page, not the complete data page, but the data page in doublewrite is complete, so you can use the complete data page in doublewrite Restore the damaged data page, and then you can apply Redo normally) 

        * Doublewrite is divided into two parts. There is a 2M doublewrite buffer in the memory, and there are also two 1M continuous doublewrite spaces in the disk file. The simple diagram of data writing after the introduction of doublewrite is as follows. From the figure, we can see that dirty data must be successful To write to the data file, you need to write to the disk twice (one time to doublewrite and one time to the data file)  

  • Compute storage supports atomic writing. What are the benefits to the database? 

      * Since the storage supports hardware-level atomic writes, that is to say, the doublewrite feature at the database level can be turned off. After turning off, the dirty data written to the data file only needs to be written to the disk once, which saves half The dirty flow of the brush. That is, in the case of ensuring that the data page does not occur partially written, it can directly alleviate the urgent need for insufficient storage throughput!

  • The second important feature: transparent data compression/decompression

  • Why does the database need to be compressed/decompressed? 

        * To put it simply, after the amount of data reaches a certain level, the storage cost is greatly saved and the storage TCO is reduced

  • What is data transparent compression/decompression? We can start to understand from the perspective of several current mainstream compression/decompression methods 

        * Soft compression/decompression (ie CPU compression): As shown in the figure below, relying on the host CPU to perform compression and decompression operations, there is a large amount of data replication and the replication link is long, and the compression and decompression operation logic needs to be implemented and controlled by the application program. 


         * Hardware compression/decompression (compression card): As shown in the figure below, compression and decompression operations are performed by a dedicated compression card that occupies a PCI slot. Although the host CPU resources are released, a large amount of copying between the host memory and the compression card is still required Data, occupy a lot of host bandwidth resources 

        * Transparent compression/decompression: As shown in the figure below, the calculation work of compression/decompression is performed directly by the computing unit integrated on the memory card, which is completely transparent to the application. The data compression and decompression are performed entirely in the disk, freeing the host CPU resources At the same time, it also releases the host bandwidth resources, and there is no need to copy a large amount of data between the host memory and the compression card (zero copy). Moreover, when the memory card is expanded, the compression/decompression computing unit can be expanded at the same time, which can realize parallel Compression/decompression operation 

 

  • Compute storage supports transparent compression/decompression. What are the benefits to the database? 

        * Under the premise of being transparent to the application and not occupying any resources of the host, greatly reducing storage costs 

        * In the storage unit of the memory card, the stored data is compressed. Therefore, the amount of stored data is greatly reduced. For solid-state storage components, it means that the write amplification can be greatly reduced, and the reduction of the write amplification can Let solid-state storage components maximize their performance advantages and reduce IO response delay. Therefore, for databases, when data compression is implemented, performance can not be affected, or even performance can be improved (especially MySQL database, after the data volume reaches a certain size, as the compression ratio increases, use Transparent compression of computing storage + close doublewrite, performance can even be greatly improved in some scenarios) 

        * Because the compression function reduces the physical space occupied by binlog, it also reduces the frequency of cleaning binlog due to insufficient storage space. At the same time, because the compression/decompression function is performed on the disk (when writing data, it is first performed by the calculation element in the disk. Compressed, and then stored in the storage unit; read from the storage unit first, and then decompressed by the computing unit in the disk), which further reduces the bandwidth occupation of the storage device 

        * InnoDB's Buffer Pool is mainly used to reduce IO operations. The reduction in read and write IO response delays means that the dependence on host memory is also reduced. In other words, InnoDB's Buffer Pool can be set smaller, and In other words, the memory resources of the host can be further released and used more for processing user connection requests

  • The third important feature: push down calculation to storage (of course, the calculation logic that needs to be pushed down for different businesses may be different, therefore, the push down of uncommon calculation logic may require joint research and development)

  • Why does the database need to push calculations down to storage? 

        * The actual query types in the production environment, non-equivalent queries (such as: non-unique index query, join table query, etc.) often account for a relatively high proportion, and these queries (especially when the query conditions involve multiple columns) are not similar to MySQL With the support of the ICP feature, the amount of data read from the storage engine often exceeds the amount of data they actually need (for example: the data that meets all query conditions may only have 10 rows, but the amount of data actually read from the storage engine It is 100 rows), this is because when MySQL executes a query, it will select a condition column to retrieve data in the storage engine, return the retrieved data to MySQL Server, and then use the remaining condition columns to filter the data. The data that meets all the conditions is then returned to the client. In this process, the filtered data is actually a waste. If you use features similar to MySQL ICP, you can push all the condition columns to the storage engine layer, and directly return the data that meets all the condition columns. There is no need to read data that does not meet all conditions. 

        * Although the characteristics of MySQL ICP can avoid unnecessary data read from the storage engine, the filtering calculation of the storage engine layer still needs to consume host CPU resources. Can the calculation amount be further pushed down to the storage device? can!

  • What is computing push-down to storage? The following three figures briefly illustrate the implementation logic of computing push-down to storage 

        * Assuming a query with multiple conditional columns (Note: It is assumed that multiple conditional columns are index columns, so I will not repeat them below). Without the support of MySQL ICP features, the query execution process is roughly as follows (Note Red font, do not go into details below). Assuming that the query can use a multi-column index, the first column in the index sequence will be used for data retrieval (retrieving column), data will be retrieved from the storage engine, and then the remaining conditional columns (filtering columns) will be used in the MySQL Server layer. Filter out data that meets all conditions 

        * If the above query is supported by features similar to MySQL ICP, then the query can avoid reading data that does not meet all conditions from the storage engine. As shown in the figure below, all condition columns (must be index columns) are downloaded Push to the storage engine layer, only read the data that matches all the conditional columns, there is no need to do data filtering at the MySQL Server layer        

        * Pushing the calculation down to the storage device refers to further optimization on the characteristics similar to MySQL ICP, pushing the calculation logic down to the storage device, and further releasing the host CPU resources and host bandwidth resources, as shown in the following figure 

  • Compute storage support pushes calculations down to storage devices. What are the benefits to the database? 

        * Through the above introduction, I think that it is unnecessary to say what the benefits of pushing down calculations like MySQL ICP to storage devices are! If more computing logic can be pushed down to the storage device, then it will inevitably further release the host's CPU, bandwidth, and even memory resources, so that the host's resources can be used more to accept and process user business requests. Further improve the performance of the database!

3. Future prospects for computing storage

  • The many excellent features of computing and storage make it possible to systematically alleviate and solve the bottlenecks and pain points of multiple databases at one time, instead of the traditional method, which is time-consuming, laborious and costly, and often loses the other.

  • Although all roads lead to Rome, there are no technical problems that cannot be solved. Without computing storage, there are certainly other various solutions, but we also need to see how it is solved. If there is a closer road, Why do you want to be far away?

  • Personally think that computing storage is a forward-looking development direction in the database field. Of course, it does not mean that you can use computing storage once and for all, but at least, when your data volume does not reach the point where computing storage can't support it, you can more Avoid or delay some of the bottlenecks and pain points mentioned above. In addition, there is another important point. It may not be painful to reduce the TCO cost of a single server, but if your server is large in scale, the cost savings can not be underestimated!

  • As for the future, what kind of computing storage can develop into, God knows, but I think that the technical breakthroughs from the bottom layer can be used, it is more cost-effective than making some technical transformations that are difficult to use at the application layer. Therefore, I believe that as long as there is a strong demand , There must be a brave man who will continue to make breakthroughs!

  • PS: The above content is compiled based on some published articles (see the link at the end of the article for details). Readers who need to know more, please refer to the reference link at the end of the article, which includes more detailed instructions and complete performance Test data, hope that this article can be more or less helpful to you on your journey to the database!

4. Reference link

  • The introduction of ScaleFlux CSD 2000 (a high-performance computing storage product from ScaleFlux) in the Percona blog: https://www.percona.com/blog/2020/08/06/how-can-scaleflux-handle-mysql- workload/Technical white paper (including more CSD 2000 test conclusions and data): https://learn.percona.com/hubfs/Collateral/Whitepapers/Testing-the-Value-of-ScaleFlux.pdf

  • "Translation|MySQL based on ScaleFlux SSD performance test" in WeChat public account "yangyidba": https://mp.weixin.qq.com/s/MNBNKlxiBBXGSOyzm5HGdQ

  • "Computable storage: data compression and database calculation pushdown" in the WeChat public account "Laoye Teahouse": https://mp.weixin.qq.com/s/iAg64XNrrZxRCLdlRJjFCQ

  • "Computable storage: transparent compression, database IO model and SSD lifetime" in WeChat public account "ScaleFlux": https://mp.weixin.qq.com/s/jh4JzyXSGhxldT01paCPvw

  • "Too powerful! NVMe SSD turned into memory" in the WeChat public account "SSDFans": https://mp.weixin.qq.com/s/niZmq170l4HDnfyw0rmRFg

  • Introduction to the realization of automatic atomic write in MariaDB: https://mariadb.com/kb/en/atomic-write-support/

    • https://mariadb.com/kb/en/mariadb-1055-changelog/

about the author:

Luo Xiaobo@ScaleFlux, one of the authors of "A Thousand Golden Recipes-MySQL Performance Optimization Pyramid".

Familiar with MySQL architecture, good at overall database tuning, like to specialize in open source technology, and keen on the promotion of open source technology, have done many public database topic sharing online and offline, and published nearly 100 database-related research articles.

The full text is over.

Teacher Ye's "MySQL Core Optimization" class has been upgraded to MySQL 8.0, scan the code to start the journey of MySQL 8.0 practice

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/108570586