[Translation] Apache Hbase New Features - MOB Support (2)

Continued from the previous article: https://my.oschina.net/u/234661/blog/1553005

MOB file read and write

Each MOB has a threshold: if the value length of a cell is larger than this threshold, this cell is regarded as a MOB cell.

When the MOB cells are updated in the regions, they are written to the WAL and memstore, just like the normal cells. In flushing, the MOBs are flushed to MOB files, and the metadata and paths of MOB files are flushed to store files. The data consistency and HBase replication features are native to this design.

The MOB edits are larger than usual. In the sync, the corresponding I/O is larger too, which can slow down the sync operations of WAL. If there are other regions that share the same WAL, the write latency of these regions can be affected. However, if the data consistency and non-volatility are needed, WAL is a must.

Medium-sized files have a minimum value (threshold), and if the cell length is greater than this value, the cell is considered a MOB cell.

When MOB cells are updated in the region, they are written to WAL and memstore, no different from normal cells. When flushing, medium-sized files are flushed to the MOB file, and the metadata and path of the MOB file are flushed to the stroe file. In this design, both consistency and replication are native.

The edit log (edits) of the MOB will be larger than usual. When synchronizing, the corresponding IO will also become larger, which will slow down the WAL synchronization. If other regions share this WAL, it will affect the write latency of these regions. If data consistency and stability are required, WAL must be used.

The cells are permitted to move between stored files and MOB files in the compactions by changing the threshold. The default threshold is 100KB.

As illustrated below, the cells that contain the paths of MOB files are called reference cells. The tags are retained in the cells, so we can continue to rely on the HBase security mechanism.

The reference cells have reference tags that differentiates them from normal cells. A reference tag implies a MOB cell in a MOB file, and thus further resolving is needed in reading.

Change the threshold to allow cells to be moved between the store file and the compressed MOB file, the default threshold is set to 100KB.

As shown in the figure below, the cell name containing the MOB file path (FileName) is called "reference cell". Tags are kept in cells, so we still use Hbase's security mechanism.

"Reference cells" are distinguished from normal cells by "reference labels". The "reference label" represents a MOB cell in the MOB file, and therefore needs to be further converted when reading.

In reading, the store scanner opens scanners to memstore and store files. If a reference cell is met, the scanner reads the file path from the cell value, and seeks the same row key from that file. The block cache can be enabled for the MOB files in scan, which can accelerate seeking..

It is not necessary to open readers to all the MOB files; only one is needed when required. This random read is not impacted by the number of MOB files. So, we don’t need to compact the MOB files over and over again when they are large enough.

The MOB filename is readable, and comprises three parts: the MD5 of the start key, the latest date of cells in this MOB file, and a UUID. The first part is the start key of the region from where this MOB file is flushed. Usually, the MOBs have a user-defined TTL, so you can find and delete expired MOB files by comparing the second part with the TTL.

When reading, the scanner scans the memstore and store file. If it encounters a "reference cell", the scanner reads the file path in the cell and finds the file by the same row key. Block caching can be enabled for scanned MOB files, which can speed up lookups. It is not necessary to open all MOB readers. Just open one. Random reads are not affected by the number of files. So, we don't need to compress large enough files over and over again.

MOB filenames are readable. It consists of 3 parts, the MD5 value of the start key, the latest date of the cell in the MOB file, and the UUID. The first part is the starting value of the MOB file flushed into the region. Usually, MOBs have a user-defined expiration time, so you can find and delete expired MOB files by comparing the second part.

snapshot

To be more friendly to the snapshot, the MOB files are stored in a special dummy region, whereby the snapshot, table export/clone, and archive work as expected.

When storing a snapshot to a table, one creates the MOB region in the snapshot, and adds the existing MOB files into the manifest. When restoring the snapshot, create file links in the MOB region.

For more friendly use of snapshots, these MOB files are stored in a special virtual region where snapshots, table export/copy and archive work as expected.

When a snapshot is stored into a table, the mob area is created in the snapshot and the existing mob file is added to the manifest. When restoring a snapshot, create a file link in the MOB region.

Clean and Compress

There are two situations when MOB files should be deleted: when the MOB file is expired, and when the MOB file is too small and should be merged into bigger ones to improve HDFS efficiency.

HBase MOB has a chore in master: it scans the MOB files, finds the expired ones determined by the date in the filename, and deletes them. Thus disk space is reclaimed periodically by aging off expired MOB files.

The MOB file needs to be deleted in two cases: 1. The file is out of date 2. The file is too small and should be merged into a large file to improve HDFS utilization.

Hbase Master has a routine job to scan MOB files, find outdated files and delete them. Along with expired files are cleaned up. Disk space is periodically reclaimed.

MOB files may be relatively small compared to a HDFS block if you write rows where only a few entries qualify as MOBs; also, there might be deleted cells. You need to drop the deleted cells and merge the small files into bigger ones to improve HDFS utilization. The MOB compactions only compact the small files and the large files are not touched, which avoids repeated compaction to large files.

MOB files may be relatively smaller than HDFS blocks if only a few entries of the written rows are MOB-eligible. And, there may also be deleted cells. You need to clean up deleted cells and use HDFS tool to merge small files into large files. HBase only compresses small files and does not involve large files, avoiding repeated compression of large files.

Some other things to keep in mind:

·       Know which cells are deleted. In every HBase major compaction, the delete markers are written to a del file before they are dropped.

·       In the first step of MOB compactions, these del files are merged into bigger ones.

·     All the small MOB files are selected. If the number of small files is equal to the number of existing MOB files, this compaction is regarded as a major one and is called an ALL_FILES compaction.

·       These selected files are partitioned by the start key and date in the filename. The small files in each partition are compacted with del files so that deleted cells could be dropped; meanwhile, a new HFile with new reference cells is generated, the compactor commits the new MOB file, and then it bulk loads this HFile into HBase.

·       After compactions in all partitions are finished, if an ALL_FILES compaction is involved, the del files are archived.

Keep the following in mind:

1. Know which cells have been deleted. At each major compaction of Hbase, the del file is written before the cell is deleted.

2. In the first step of MOB compression, the del files are merged into larger ones.

3. All MOB small files are selected. If the number of small files is equal to the number of existing MOB files, this compression is considered a major compression, called ALL_FILES compression.

4. These selected files are partitioned by the start key and date in the filename. Small files in each partition are compressed with del files so that deleted cells are discarded. At the same time, a new HFile is created with the new reference unit, the compressor commits the new MOB file, and then it bulk loads this HFile into HBase.

5. After compression in all partitions is complete, if ALL_FILES compression is involved, the del files are archived.

The life cycle of MOB files is illustrated below. Basically, they are created when memstore is flushed, and deleted by HFileCleaner from the filesystem when they are not referenced by the snapshot or expired in the archive.

The life cycle of an MOB file is explained below. Basically, they are created when the memstore is flushed and removed from the filesystem by HFileCleaner when not referenced by snapshots or expired in the archive.

Summarize

In summary, the new HBase MOB design moves MOBs out of the main I/O path of HBase while retaining most security, compaction, and snapshotting features. It caters to the characteristics of operations in MOB, makes the write amplification of MOBs more predictable, and keeps low latencies in both reading and writing.

In summary, the new HBase MOB design moves medium-sized files out of HBase's primary read and write paths, while retaining most of the security, compression, and snapshot features. It caters to the characteristics of MOB operations, making large MOB writes more predictable and keeping read and write latency low.

 

Thanks to the original author:

Jincheng Du is a Software Engineer at Intel and an HBase contributor.

Jon Hsieh is a Software Engineer at Cloudera and an HBase committer/PMC member. He is also the founder of Apache Flume, and a committer on Apache Sqoop.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325900593&siteId=291194637