HBase data compression coding exploration

Abstract: This article mainly introduces the support of hbase for data compression and encoding, and the improvement of data compression rate and access speed made by cloud hbase on the basis of the community.

foreword

Have you ever encountered such a requirement, only a few hundred qps of cold data cache, but dozens of servers are wasted because of the storage water level? Have you ever encountered such a requirement, a table with hundreds of gigabytes must be purely cache hit, and the performance can meet the business requirements? Have you ever encountered a small table with dozens of M, because the qps is too high, you must keep split, balance, and use multiple servers to fight hot spots? 
Faced with complex scenarios, the Ali-HBase team has been committed to providing more choices and lower costs for businesses. This article mainly introduces the two main methods of improving the compression rate of hbase at present: compression and DataBlockEncoding.

Lossless compression: smaller, faster and less resource intensive

General compression is an important means for the database to solve the storage problem. Usually, the database has the concept of data block, and compresses and decompresses each block. The larger the block, the higher the compression rate, the higher the scan throughput; the smaller the block, the lower the random read IO pressure and the lower the read latency. As a kind of tradeoff, online hbase usually uses a block size of 64K, does not compress in the cache, and only performs compression and decompression operations when placing and reading disks.

Open source hbase usually uses LZO compression or Snappy compression. The common feature of these two kinds of compression is that they both pursue higher compression and decompression speed and achieve a reasonable data compression rate. However, with the rapid growth of business, more and more businesses have been expanded due to storage water level issues. In response to this situation, hbase has adopted methods such as replica number optimization and model upgrade based on cross-cluster partition recovery technology, but it still cannot meet the rapid expansion of storage capacity. Therefore, we have been working on finding a higher compression method.

New compression (zstd, lz4) online

Zstandard (abbreviated as Zstd) is a new lossless compression algorithm designed to provide fast compression and achieve high compression ratios. It neither pursues the highest possible compression ratio like LZMA and ZPAQ, nor does it pursue the ultimate compression speed like LZ4. The compression speed of this algorithm exceeds 200MB/s, and the decompression speed exceeds 400MB/s (laboratory data), which can basically meet the current HBase throughput requirements. It has been verified that the data compression rate of Zstd can be basically increased by 25%-30% compared with Lzo. For storage-type businesses, this means a one-third to one-quarter cost reduction.

In another case, the storage capacity of some tables is small, but the qps is large, and the RT requirements are extremely high. For this scenario, we introduce lz4 compression, whose decompression speed can be more than twice that of lzo in some scenarios. Once the read operation is placed on the disk and needs to be decompressed, the RT and CPU overhead of lz4 decompression are significantly smaller than lzo compression.

We first visually show the performance of various compression algorithms through a picture: 
compress1

compress2

Take several typical online data scenarios as an example to see the actual compression ratio and single-core decompression speed of several compressions (the following data are all from practical applications)

business type Uncompressed table size LZO (compression rate/decompression speed MB/s) ZSTD (compression rate/decompression speed MB/s) LZ4 (compression rate/decompression speed MB/s)
Monitoring class 419.75T 5.82/372 13.09/256 5.19/463.8
log class 77.26T 4.11/333 6.0/287 4.16/ 496.1
Risk control 147.83T 4.29/297.7 5.93/270 4.19/441.38
consumer 108.04T 5.93/316.8 10.51/288.3 5.55/520.3

At present, on Double 11 in 2017, ZSTD has been fully rolled out online, and a total of PB has been optimized for storage. LZ4 has also been launched in some businesses with higher reading requirements. 
The following figure shows the decrease of the overall storage capacity of the cluster after applying the zstd compression algorithm for a monitoring class. The amount of data is reduced from 100+T to 75T.

result

Coding Technology: Instant Decompression for Structured Data

As a schema free database, hbase is more flexible than a traditional relational database. Users can write data of different schemas in the same table without designing the table structure. However, due to the lack of data structure support, hbase requires many additional data structures to mark length information, and cannot use different compression methods for different data types. In response to this problem, hbase proposes an encoding function to reduce storage overhead. Since encoding has less overhead on the CPU and has a better effect, the encoding function is usually also enabled in the cache.

Introduction to Old DIFF Encoding

Hbase has long supported DataBlockEncoding, which is to compress data by reducing the repeated part in hbase keyvalue. Taking the most common DIFF algorithm online as an example, the result after a certain kv compression:

  • A one-byte flag (the function of this flag will be explained later)
  • If the key length is different from the previous KV, write the length of 1~5 bytes
  • If it is different from the last KV value, write the length of 1~5 bytes
  • Record the same prefix length as the previous KV key, 1~5 bytes
  • the row key of the non-prefix part
  • If it is the first KV, write the column family name
  • non-prefix part of the column name
  • Write the timestamp of 1~8 bytes or the difference from the timestamp of the last KV (whether it is the original value or the difference between the write and the last KV, depending on which byte is smaller)
  • If it is different from the type of the previous KV, write a 1-byte type (Put, Delete)
  • Value content

Then when decompressing, how to judge whether the key length of the last KV is the same, whether the value length is the same, and whether the written timestamp is the original value or the difference value? These are implemented by the earliest written 1-byte flag 
. The 8-bit bits in this byte mean:

  • Bit 0, if it is 1, the bond length is equal to the previous kv
  • The first bit, if it is 1, the value length is equal to the previous kv
  • The second bit, if it is 1, the type is the same as the previous kv
  • Bit 3, if it is 1, the timestamp written is the difference, otherwise it is the original value
  • The 456th bit, the combined value of these 3 bits (can represent 0~7), represents the length of the written timestamp
  • The 7th bit, if it is 1, indicates that the written timestamp difference is a negative number, and the absolute value is taken.

diff

After DIFF encoding, the seek for a file consists of the following two steps:

  1. Find the corresponding datablock by index key
  2. Starting from the first complete KV, search sequentially, and continue to decode the next KV until the target KV is found.

DIFF encoding is better for small kv scenes, which can reduce the amount of data by 2-5 times.

New Indexable Delta Encoding launched

From a performance point of view, hbase usually needs to load Meta information into the block cache. If the block size is small and the Meta information is large, the Meta cannot be fully loaded into the Cache, and the performance will be degraded. If the block size is large, the performance of DIFF Encoding sequential query will become the performance bottleneck of random read. In response to this situation, we have developed Indexable Delta Encoding, which can also be used for quick queries within the block, and the seek performance has been greatly improved. The principle of Indexable Delta Encoding is shown in the figure:

index

After finding the corresponding data block through BlockIndex, we find the offset of each complete KV from the end of the data block, and use binary search to quickly locate the complete kv that meets the query conditions, and then sequentially decode each Diff kv until the target kv position is found .

With Indexable Delta Encoding, the random seek performance of HFile is doubled compared to before use. Taking a 64K block as an example, in the random Get scenario with full cache hits, compared to Diff encoding rt, the performance is reduced by 50%, but the storage overhead is only increased by 3 -5%. Indexable Delta Encoding has been applied in multiple scenarios online, and has withstood the test of Double Eleven. The overall average read rt is reduced by 10%-15%.

 

cloud use

Alibaba HBase has now provided commercial services on Alibaba Cloud, and any user in need can use the deeply improved, one-stop HBase service on Alibaba Cloud. Compared with the self-built HBase, the cloud HBase version has many improvements in operation and maintenance, reliability, performance, stability, security, cost, etc. For more content, please pay attention to  https://www.aliyun.com/product/ hbase

Reprinted from: https://yq.aliyun.com/articles/277084

 


communicate with

If you are interested in HBase and are committed to using HBase to solve practical problems, welcome to join the HBase technical community group exchange:

WeChat HBase technical community group , if the WeChat group cannot be added, you can add the secretary WeChat:  SH_425  , and then invite you.

 

 

​DingTalk   HBase technical community group

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324689242&siteId=291194637