Advanced Compression of GaussDB Technical Interpretation Series

The author of this article is Feng Ke, Chief Architect of Huawei Cloud Database GaussDB

background introduction

The combination of data compression and relational databases is no longer a new topic. We have seen various database compression products and solutions. For GaussDB, the introduction of data compression today and what different value it can bring to customers is a question we have been thinking about for some time in the past.

To answer this question, we first conducted extensive tests on various common compression algorithms, from LZ4/Snappy with the best performance, to Zstd/Zlib with balanced performance and compression ratio, to LZMA/BZip with emphasis on compression ratio. We have found that even the best compression algorithms cannot do so without significantly affecting the performance of an online database. We have also investigated various coding methods in the database field, including some coding methods based on prediction and linear fitting released by the academic community in recent years. From the test results and actual measurements released by the research, database coding is used to solve specific numerical distributions. Compared with the maturity of the compression algorithm, there is currently no general database encoding method that can provide a stable compression rate in most scenarios in real data sets.

This is our basic technical judgment in the field of database compression. Past product practice has also verified this point. We have seen that many commercial databases and open source databases provide support for compression. Most of the time, the choice left to customers is to decide whether to enable compression on a specific table. Turning on compression means saving space, but at the same time means degrading performance. This seemingly simple choice is precisely the most difficult for customers to make. This is why there are so many database compression products, but we rarely see the fundamental reason why data compression is really widely used in database online business.

This gives us more inspiration. We believe that the truly applicable database compression technology, which can balance the compression ratio and business impact, should be selective. That is, we can determine the temperature of data based on technology, and based on this determination, we can selectively compress relatively cold data in the business without touching those relatively hot data.

Such a technical choice means that we cannot satisfy all business scenarios. We require that the data temperature distribution of the business must meet the 80-20 distribution rule. That is, we compress cold data that takes up 80% of storage requirements but only 20% of computing requirements, and do not touch those hot data that only occupy 20% of storage requirements but occupy 80% of computing requirements. Fortunately, we found that most businesses that require capacity control have such characteristics.

Scenario and target selection

Through the analysis of a large number of business scenarios, we found that the business needs for database compression technology are diversified. There are online transaction business (OLTP) storage compression scenarios, analysis business (OLAP) storage compression scenarios, and historical business storage compression scenarios. There are also scenarios where disaster recovery business transmission is compressed. In different scenarios, the requirements for compression technology are completely different in terms of the three-dimensional indicators of compression performance, compression rate, and decompression performance, and in terms of tolerance to business intrusion.

This means that if we want to create a full-scenario GaussDB advanced compression feature, it should be a combination of multiple technologies, including different compression algorithms, different hot and cold judgment models and methods, different data storage organizations, etc., through different Combination of technologies and applications to meet the needs of different scenarios.

This also means that we need to have a priority trade-off in the support of different compression application scenarios. Our answer is to choose to give priority to supporting OLTP storage compression scenarios. This is the business area where we believe that database compression technology is the most valuable, and of course it is also the area with the greatest technical challenges.

After determining the scenario, the next step is to determine the technical goal. What kind of core competitiveness we want to build for this scenario depends on our analysis of typical customer scenarios. We identified two typical customer scenarios:

Scenario A: The customer's business comes from an IBM minicomputer with a single database capacity of 50TB. After migrating to the open platform, it faces the problems of excessive capacity and long operation and maintenance windows. Choosing to disassemble the database means distributed transformation. For a stock key business that has been running stably for many years, the risk of choosing this technology is too high. Choosing compression can significantly reduce capacity risks, but the initial design of the business did not consider the separation of hot and cold (such as establishing partitions based on the time dimension). A zero-intrusive compression technology support is required, and the impact on business performance is low enough.

Scenario B: Customer business is deployed based on distributed clusters. The capacity of a single cluster has exceeded 1PB and is still growing rapidly, requiring regular expansion. Choosing compression can reduce the frequency of capacity expansion, significantly reduce business software and hardware costs, and reduce the risk of change. However, the data distribution design of the business is oriented toward scalability (such as creating partitions based on the user dimension), without considering the separation of hot and cold. Therefore, the business also needs a zero-intrusive compression technology support with a sufficiently low impact on performance.

Through sorting out the requirements of typical customer scenarios, we determined the basic design goals of GaussDB OLTP storage compression:

  1. The hot and cold judgment should be zero intrusive to the business, and should not have any dependence on the existing data distribution and logical model of the business;

  2. The impact on business must be low enough, we define the target to be less than 10%, and challenge 5%;

  3. To provide a reasonable compression ratio, we define the target to be no lower than 2:1.

The definition of basic design goals enables us to turn the subsequent technology selection in each specific scenario into a deterministic problem.

Hot and cold judgment

After determining the design goals, we began to implement the project. There are three problems to be solved: 1) How to realize the determination of hot and cold data; 2) How to realize the storage organization of compressed data; 3) How to realize a competitive compression algorithm.

For hot and cold judgments, the granularity of the judgment must first be determined. The hot and cold judgment of data can be implemented based on different granularities, such as row level, block level or table/partition level. The coarser the granularity, the lower the complexity of the implementation, but the greater the intrusion on the business. Based on the design goals, it is natural that we choose the row-level hot and cold judgment, which is the solution with the least dependence on business data distribution. What we need to solve is how to control the cost of introducing hot and cold judgment.

We cleverly solved this problem by using the existing mechanism of the GaussDB storage engine. Specifically, the GaussDB storage engine records the transaction ID (XID) of the latest modification of the row in the metadata Meta of each row of data. This information is used to support the visibility judgment of the transaction, thereby implementing multi-version concurrency control (MVCC ). For a specific row, if its XID is "old" enough that it is visible to all currently active transactions, then we don't actually care about the specific value of the XID at this time. We can introduce a specific flag (FLG) to record this, and the value filled in the original XID can be replaced by a physical time, which represents the upper limit of the last modification time (LMT, Last Modified Time) of the row to which it belongs. Obviously, LMT can be used to support hot and cold judgments (see Figure 1 for details):
insert image description here

Figure 1: Row-level hot and cold determination

The advantage of the above solution is that the introduction of LMT does not add additional overhead, nor does it depend on the business logic model. Most of the time, if the requirements are not particularly strict, the business can define a simple rule to achieve hot and cold judgments, such as:

AFTER 3 MONTHS OF NO MODIFICATION

At this time, the system scans the target table and compresses all rows that satisfy the current time minus the LMT for more than 3 months.

Note that in the above scheme, we actually only identified the write hotspots of the rows, but did not identify the read hotspots of the rows. We only know that the rows that meet the conditions have not been updated within 3 months, but we cannot confirm that these rows are in 3 months. Whether it is frequently read within a month. Currently, there is no low-cost technical solution to maintain the read hotspots of rows. For pipeline services such as order details, this solution works well, because the reading and writing of data exhibit the same temperature characteristics, and its access frequency continues to decay as the unmodified time increases. But for collection services like mobile phone photo albums, it may not be enough to identify and write only, because a collection relationship established very early may still be frequently accessed.

This means that even if the system makes hot and cold judgments, we still need to optimize the scenarios where the business may access compressed data. We leave this issue to the storage organization and compression algorithm. For the compression algorithm, we pay more attention to its decompression performance .

Another problem is that in some scenarios, using the default hot and cold judgment may not be enough. For example, for some types of transactions, the generated order details may indeed not be modified within 3 months, but will be Is updated after a certain trigger condition is met (such as unfreezing secured transactions). This scenario is not common in actual business, but if the business really cares about performance, then we support allowing the business to customize rules in addition to the default hot and cold judgment rules, such as:

AFTER 3 MONTHS OF NO MODIFICATION ON (order_status = “finished”)

At this time, the system will only compress data that has not been modified for 3 months and whose order status has been completed.

Currently, the custom rules we support are any legal row expressions. The business can write any complex expressions to represent the hot and cold judgment rules of the data, but any fields referenced in the expressions can only be the fields on the target table. legal fields. Through this combination of default and custom rules, we provide businesses with a sufficiently low threshold for use and better flexibility.

storage organization

When the rows that meet the hot and cold judgment conditions are compressed, we need to decide how to store the compressed data. Based on the design goal, we choose the storage organization implementation with the least business intrusion—intra-block compression.

We know that the storage organization of relational databases is based on fixed-length blocks. In the GaussDB database, the typical data block size is 8KB. Choosing a larger data block is obviously beneficial to compression, but it will cause greater impact on business performance. Influence. The so-called intra-block compression refers to: 1) All rows in a single block that meet the hot and cold judgment conditions will be compressed as a whole; 2) The data formed after compression is stored in the current data block, and the storage area is called BCA (Block Compressed Area), which is usually located at the end of the block.

The design of intra-block compression means that decompressing any data only depends on the current block, without accessing other data blocks. From the perspective of compression ratio, this design is not the most friendly, but it is very beneficial to control business impact. Note that in our previous discussion, even if the business defines hot and cold judgment conditions, there is still a certain probability that compressed data will be accessed. We hope that this access cost can have a deterministic upper limit.

Figure 2 shows the detailed process of intra-block compression: first, when the compression is triggered, the system scans all the rows in the data block, and recognizes that R1 and R3 are cold data according to the specified hot and cold judgment conditions (Figure 2(a )); then, the system compresses R1 and R3 as a whole, and stores the compressed data in the BCA of the data block (Figure 2(b)); if the business needs to update R1 later, the system will update The last data generates a new copy R4, and marks that R1 in the BCA has been deleted (as shown in Figure 2©); finally, when the system needs more space on the data block, it can reclaim the space belonging to R1 in the BCA ( Figure 2(d)).
Figure 2: Intra-block compression

Figure 2: Intra-block compression

There are two points to note in the whole design: 1) We actually only compress the user data Data, and do not compress the corresponding metadata Meta, which is usually used to support transaction visibility; 2) We support cold data re-compression Change to hot data to eliminate the impact caused by misjudgment of hot and cold. Similarly, from the perspective of compression ratio, such a design is not the most friendly, but it greatly reduces the intrusion to the business. To put it simply, business access to compressed data is exactly the same as normal data, without any restrictions on functions, and there is no difference in transaction semantics. This is a very important principle: our OLTP storage compression is completely transparent to the business, which is the basic principle that this current feature and all subsequent GaussDB advanced compression series features will follow.

compression algorithm

Based on the design goals, if we look at the three-dimensional indicators of compression ratio, compression performance, and decompression performance, what we actually need is a compression algorithm that can provide reasonable compression ratio, reasonable compression performance, and ultimate decompression performance. This is The basis of our compression algorithm design.

We first tested the direct use of LZ4 for compression. LZ4 is currently known as an open source tripartite library with the best compression performance and decompression performance. From the actual measurement results, the compression rate of LZ4 is relatively low. We carefully analyzed its algorithm principle. LZ4 is an implementation based on the LZ77 algorithm. The idea of ​​the LZ77 algorithm is very simple, that is, the data to be compressed is regarded as a byte stream. The algorithm starts from the current position of the byte stream and forwards. Find the matching string that is the same as the current position, and then use the length of the matched string and the offset from the current position to represent the matched string, so as to achieve the effect of compression. From the perspective of the algorithm principle, the LZ77 algorithm has a better compression effect for long text, but for a large number of short text and numeric types in structured data, the effect is limited. Our actual tests have also verified this point.

Next, we divided the compression algorithm into two layers: In the first layer, we encoded some numeric types by columns, and we chose simple difference encoding, which is lightweight enough to decompress specific fields without relying on Values ​​of other fields; in the second layer, we call LZ4 to compress the encoded data. Note that in the first layer, we actually encode by column and store by row, which is very different from the general implementation in the industry (encode and store by column). Storage by column will be more friendly to the compression rate, but column by column Storage means that the data of the same row will be scattered to different areas of the BCA. This traditional design cannot support the partial decompression we hope to achieve later. We will explain this problem in more detail in the conclusion.

Through actual measurement, we found that this implementation of column encoding + general compression effectively improves the compression rate and at the same time controls the obvious increase in business impact, but the two-layer implementation is loosely coupled, which introduces a lot of additional overhead. Therefore, after careful weighing, we decided to abandon LZ4 and re-implement a tightly coupled compression algorithm based entirely on the LZ77 algorithm.

This seemed to be a very risky attempt at the time. In fact, before us, no database kernel team would choose to implement a general compression algorithm by itself. But judging from the final benefits, we actually opened a whole new door. When the boundary between the column encoding and the LZ77 algorithm is broken, we introduce a series of optimization innovations. Considering the space, we cannot show all the technical details. Here, we only introduce two small optimizations:

The first optimization is built-in row boundaries. We found that when the system adopts the two-layer compression algorithm, we need to additionally save the encoded length of each row of data, because we need to find the boundary of each row after the LZ77 algorithm is decompressed, which is not a small overhead. In order to eliminate this overhead, we choose to embed a line boundary mark in the LZ77 encoding format. This mark only occupies 1 bit, and its overhead is greatly reduced compared with the existing scheme. Of course, with this marker bit occupied, the maximum window length of the LZ77 forward search is halved, but in our scenario, this is not a problem, because our typical page length is only 8KB.

The second optimization is the 2-byte short encoding. In the original LZ4 implementation, in order to improve the compression performance, the system uses 3-byte encoding to describe a match, which means that the shortest match that the system can recognize is 4 bytes. But in structured data, 3-byte matching is very common, refer to the following example:

A = 1 … B = 2

Among them, A and B are two integer fields in the same row of data, and their values ​​are 1 and 2 respectively. Based on the current byte order, the actual storage form of the row of data in memory is as follows:

01 00 00 00 … 02 00 00 00

Pay attention to the part marked in red above. Obviously, there is a 3-byte match, but it cannot be recognized by LZ4.

We solve this problem by introducing an additional 2-byte short code into the LZ77 algorithm. The 2-byte short code can identify a minimum 3-byte match, thereby improving the compression rate compared to LZ4.

Of course, the introduction of short codes will have additional overhead: First, the compression performance will decrease to a certain extent, because we need to establish two independent HASH tables. Fortunately, in our scenario, the ultimate compression performance is not our The goal to pursue; second, the 2-byte encoding reduces the bit width that expresses the distance between the matching string and the matched string, which means that the 3-byte matching must be closer to be recognized. In our scenario, This is not a problem, because the length of a typical data row is small enough relative to this limit.

effect evaluation

We use standard TPCC tests to evaluate the business impact of enabling the OLTP storage compression feature. The TPCC model contains a total of 9 tables, of which there are 3 flow tables whose space will grow dynamically. Among these 3 tables, the order details table (Orderline table) has an order of magnitude more space growth than other tables, so we choose to use this table Turn on compression. Based on the business semantics of TPCC, once the delivery of each order is completed, the order status will enter the completed state. The completed order will not be modified, but there is still a certain probability that it will be queried. Based on this semantics, we choose the hot and cold judgment principle to only compress completed orders.

We tested the performance values ​​of the system without compression and with compression, and the results are shown in Figure 3:
insert image description here

Figure 3: Business Impact Assessment

The test results show that: in the TPCC test scenario, the system performance is reduced by about 1.5% when compression is enabled compared to when compression is not enabled. This is a very good result, which means that the system can enable compression even in peak business scenarios exceeding one million tpmC. We don't know if any other database products in the industry have been able to reach this level before this.

We tested the compression rate of the Orderline table. As a richer data set, we also selected four tables (Lineitem, Orders, Customer, and Part tables) in the TPCH model for testing. For comparison, for each data set, we tested the compression ratio performance of LZ4, ZLIB and our compression algorithm at the same time. Among them, ZLIB is an algorithm that emphasizes compression and decompression performance and compression ratio balance, and its compression and decompression performance is 5 times lower than that of LZ4. -10 times. The final result is shown in Figure 4:
insert image description here

Figure 4: Compression Ratio Evaluation

The test results are in line with our expectations. When there are many numeric fields, the compression rate of our compression algorithm is higher than that of all general compression algorithms, but when there are many text fields, the compression rate of our compression algorithm will be between LZ and between compression algorithms of the LZ+Huffman combined class.

Operation and maintenance tips

Note that our compression scheme is actually offline, that is, when data is first generated, it must be hot data, they will not trigger compression, and the performance of business access to these data will not be affected in any way; as time goes by, the performance of these data The temperature will decrease gradually, and eventually it will be recognized as cold data by independent compression tasks and compressed.

Choosing to run these compression tasks during low-peak business hours and controlling their resource consumption are issues that need to be paid attention to by the operation and maintenance side. In this area, we provide a wealth of operation and maintenance methods, including specifying the operation and maintenance window, the parallelism of compression tasks, and the amount of compressed data for each compression task. For most businesses, the amount of newly added data per unit time is actually relatively limited, so the business can also choose a specific time period to complete the compression task intensively, such as from 2:00 am to 4:00 am on the first day of each month. point, complete the compression of cold data added 3 months ago.

Before a business decides to enable compression, it may wish to understand the benefits after enabling compression, and make a decision based on the size of the benefits. To this end, we provide a compression rate evaluation tool, which can sample the data of the target table, and use the same algorithm as the actual compression process to compress the sampled data and calculate the compression rate, but it will not actually generate BCA and will not modify it. any data.

If the business migrates compressed data to another table, all data may change from compressed to uncompressed, resulting in space expansion. This is not introduced by our solution, but a problem that all compression solutions need to solve. If the hot and cold judgment rules are very certain, the business can manually execute the compression task to make the compression take effect immediately; for the migration of large-capacity compressed tables that take a long time, the business can still choose to start the automatic compression task periodically to complete.

Finally, we provide the most fine-grained control for enabling and disabling compression. Whether it is a single partition in a common table, a common partition table, or any single partition or sub-partition in a secondary partition, the business can be turned on or off independently compression. This makes it possible for scenarios where the business itself has differentiated hot and cold data (for example, based on time partitioning), it can still work well with our compression feature.

conclusion

In the feature of OLTP table compression, we have introduced a series of technological innovations, including brand-new compression algorithms, fine-grained automatic hot and cold determination, and intra-block compression support, which can greatly reduce the impact on the table while providing a reasonable compression rate. Business impact, we hope that this feature can play an important role in supporting the capacity control of critical online services.

Next, we will continue to innovate and iterate in reducing the impact of the introduction of compression on the business, partial decompression features, and OLTP index compression. We hope to have groundbreaking technological breakthroughs to solve related problems and create greater value for the business.

Guess you like

Origin blog.csdn.net/GaussDB/article/details/131930899