Accessing image and document data in HBase (HBase MOB)

Introduction to Hbase MOB

HBase usually has good performance for accessing data less than 10K. If the file is slightly larger, such as the size of a medium file, the size is between 100K<10M, due to the performance degradation caused by compression, the region will be unavailable.

In order to solve this problem, HBase introduced support for medium-sized files, which is the Hbase MOB (The Moderate Object Storage) feature, or Hbase Object Storage. See HBase-11339 for details .

For the introduction of HBase MOB, you can refer to several articles:

This feature was only incorporated in HBase version 2.0.0. Since it is still a Beta version, it is not recommended to use it.

If you want to use this feature, the following versions are recommended:

  1. Cloudera - CDH 5.4.x and later
  2. Hortonworks - HDP 2.5 and later
  3. Huawei - FusionInsight_HBase (not open source, usually used in the telecom industry)

Applicable scene

This feature is suitable for storing pictures, documents, PDFs, and small videos in Hbase.

Typical scenario:

  • Bank access customer signature or scanned copy.
  • The transportation department has access to pictures of the car.

MOB configuration method

  1. Enable HFile Version 3

Add properties in hbase-site.xml

<property>
  <name>hfile.format.version</name>
  <value>3</value>
</property>

  2. Specify the column as MOB type

  • Set IS_MOB to true to store this column as MOB.
  • MOB_THRESHOLD sets the threshold. Files that exceed the threshold size are treated as MOBs. The default threshold is 100KB.

HBase Shell Statement:

hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD =>
102400}

If using JAVA API, the code is as follows:

HColumnDescriptor hcd = new HColumnDescriptor(“f”);
hcd.setMobEnabled(true);
hcd.setMobThreshold(102400L);

HBase MOB cache settings

Attributes Defaults illustrate
hbase.mob.file.cache.size 1000 Number of cached files
hbase.mob.cache.evict.period 3600 cache cleanup time
hbase.mob.cache.evict.remain.ratio 0.5f float type, between 0 and 1

MOB test

$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
            -threshold 102400 \
            -minMobDataSize 512 \
            -maxMobDataSize 5120

Manually compress MOB files

Use compact_mob and major_compact_mob.

The first parameter is the table name. If only the table name is passed, all MOB columns in the table will be compressed.

The second parameter is the column name. If the column name is passed in, only the specified column will be compressed.

hbase> compact_mob 't1'
hbase> compact_mob 't1', 'f1'
hbase> major_compact_mob 't1'
hbase> major_compact_mob 't1', 'f1'

Set MOB compression strategy (compressed into one file by week/month)

The default daily MOB is compressed into one file.

Apache HDFS has a memory limit on the number of files in the same directory. After the number of MOB files exceeds this HDFS limit, the MOB table is no longer writable. The default maximum number of files in a single directory of Apache HDFS is 1 million.

365 days a year, one file per day, if there are 1000 regions, then 365,000 files will be generated in one year, and the limit will be exceeded in 3 years. The more areas, the faster the limit is reached.

So, if desired, it can be set to compress weekly data into one file, or monthly data into one file.

By default, the MOB compression partition strategy is used by day. To apply a weekly or monthly policy, a new attribute MOB_COMPACT_PARTITION_POLICY has been added to the MOB column family. Users can set this property when the HBase shell creates a table.

create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly’}

Users can also change MOB_COMPACT_PARTITION_POLICY for existing tables from the HBase shell.

alter 't1', {NAME => 'f1', MOB_COMPACT_PARTITION_POLICY => 'monthly'}

If the policy is changed from daily to weekly or monthly, or weekly to monthly, the next MOB compaction will recompress the MOB files that were compacted by the previous policy. If the policy is changed from monthly or weekly to daily or monthly, weekly updates of already compressed MOB files with the previous policy will not be recompressed by the new policy.

 

references:

https://issues.apache.org/jira/browse/HBASE-11339

https://issues.apache.org/jira/browse/HBASE-16981

http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/

https://blog.cloudera.com/blog/2017/06/introducing-apache-hbase-medium-object-storage-mob-compaction-partition-policies/

https://blog.cloudera.com/blog/2009/02/the-small-files-problem/

http://developer.huawei.com/cn/ict/Products/BigData/FusionInsightHD/HBase/SDK#section-1

https://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_mob.html

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-access/content/ch_MOB-support.html

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325900555&siteId=291194637
Recommended