Hbase principle of data storage and read-write Explanation

1, HBase data storage principle

Here Insert Picture Description

  • A HRegionServer will be responsible for managing a number of region
  • A * Region contains a number of store
    • A column family can be divided into a store **
    • If a table is only one column family, then each region is only one store
    • If there are N columns of a group table, each region of the store with N
  • A store which is only a memstore
    • memstore is an area of memory , the write data is written memstore will first be buffered, and then the data to disk brush
  • A store there are a number of StoreFile , final data are lots of HFile file such a data structure stored in HDFS on

    • StoreFile is HFile abstract objects, when it comes to StoreFile equals HFile
    • Every memstore brush to write data to disk, it generates the corresponding file out a new HFile
      Here Insert Picture Description

      2, HBase data read process

      Here Insert Picture Description
      Description: HBase cluster, only one meta table, this table has only one region, the region on the data stored in a HRegionServer

  • 1, the client is first connected with zk; Found region from positions meta table zk, i.e. meta data table is stored on a HRegionServer; HRegionServer client to establish a connection to this, and meta data table read; meta table All information is stored in the user table of the region, we can scan 'hbase:meta'view the meta information table
  • 2. The namespace to query, table name and rowkey information. Find information region corresponding write data
  • 3, this region corresponds regionserver found, and send the request
  • 4, to search and locate a corresponding region
  • 5, starting memstore find the data, and if not, then read from the BlockCache
    • HBase the memory divided into two parts Regionserver
    • As part of Memstore, mainly used to write;
    • Also as part of BlockCache, mainly for reading data;
  • 6, if BlockCache also not found, then read on StoreFile
    • After reading the data from storeFile, not directly returns the data to the client, but the data is first written to BlockCache, the goal is to speed up the subsequent query; then return the results to the client.

3. HBase write data flow

Here Insert Picture Description

  • 1, the client first table to find the region from positions ZK meta, meta and then read the data in the table, the table is stored in the meta information of the user table region

  • 2, according to information namespace, table name and rowkey. Find information region corresponding write data

  • 3, this region corresponds regionserver found, and send the request

  • 4, the data are respectively written HLog (write ahead log) and a respective memstore

  • 5, after reaching the threshold memstore brush data to disk, the file generating storeFile

  • 6, delete the historical data in HLog
补充:
HLog(write ahead log):
    也称为WAL意为Write ahead log,类似mysql中的binlog,用来做灾难恢复时用,HLog记录数据的所有变更,一旦数据修改,就可以从log中进行恢复。

4, HBase is flush mechanism

4.1, flush trigger conditions

4.1.1, memstore level limit

  • When any one of MemStore Region reached the upper limit of the size (hbase.hregion.memstore.flush.size, default 128MB), it will trigger Memstore refresh.
<property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>
</property>

4.1.2, region level limit

  • When Region in the size of the sum of all Memstore reached the upper limit (hbase.hregion.memstore.block.multiplier hbase.hregion.memstore.flush.size, default 2 128M = 256M), it will trigger memstore refresh.
<property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>
</property>
<property>
    <name>hbase.hregion.memstore.block.multiplier</name>
    <value>2</value>
</property>   

4.1.3, Region Server level limit

  • When a sum of all Memstore Region Server magnitude exceeds the low level threshold hbase.regionserver.global.memstore.size.lower.limit * hbase.regionserver.global.memstore.size (former default value 0.95), RegionServer the flush start mandatory;
  • First Flush largest Region Memstore, then execute the next largest, followed by the implementation;
  • The flush write speed is greater than the write speed, resulting in a total size exceeds MemStore high level threshold hbase.regionserver.global.memstore.size (default is 40% JVM memory), block updates RegionServer case flush and enforcing, until MemStore total size is below the low level threshold
<property>
    <name>hbase.regionserver.global.memstore.size.lower.limit</name>
    <value>0.95</value>
</property>
<property>
    <name>hbase.regionserver.global.memstore.size</name>
    <value>0.4</value>
</property>

4.1.4, HLog maximum number

  • When in a Region Server HLog number reaches the upper limit (configurable parameter hbase.regionserver.maxlogs), the system will select a HLog a corresponding plurality of first or be flush Region

4.1.5 periodically refresh Memstore

  • The default period is one hour, not for a long time did not ensure Memstore persistence. In order to avoid all the problems MemStore have carried out due to flush at the same time, regular flush operation random delay of around 20,000.

4.1.6 manually flush

  • The user can command shell flush ‘tablename’or flush ‘region name’respectively a table or a Region be flush.

4.2, flush process

  • To reduce the impact on the read process of flush, flush the whole process is divided into three stages:

    • prepare stage: through all Memstore Region currently, the Memstore the current data set CellSkipListSet do a snapshot Snapshot ; then create a CellSkipListSet. Late written data is written in the new CellSkipListSet. prepare to stage need to add a updateLock write request blocking , the lock will be released after the end. Because at this stage there is no time-consuming operation, thus holding the lock time is very short.

    • flush stages: through all the Memstore, the prepare phase generated snapshot persisted as temporary files , temporary files are placed under a unified directory .tmp. This process because it involves disk IO operations, and therefore relatively time-consuming.
    • commit stage: traverse all Memstore, the next stage will flush generated temporary files to a specified directory ColumnFamily for HFile generate the corresponding storefile and Reader, add storefile to storefiles list HStore, the last and then empty the snapshot prepare the stage production.

5, Compact merger mechanism

  • In order to prevent small hbase == == too many files to ensure that the query efficiency, hbase required when necessary to merge these small store file into a relatively large store file, this process is called compaction.

  • The combined presence of mainly two types of compaction in the hbase
    • Small == == minor compaction merger
    • == == major compaction big merger

4.3.1 minor compaction small merge

  • Store in a plurality HFile combined into a HFile

    In this process we will select some small, adjacent StoreFile they will merge into a larger StoreFile, for more than a TTL data, update data, delete the data just been marked. Not physically removed, once the results of Minor Compaction is fewer and bigger StoreFile. This consolidation of the trigger frequency is high.

  • minor compaction trigger conditions are determined by the following parameters:
<!--表示至少需要三个满足条件的store file时,minor compaction才会启动-->
<property>
    <name>hbase.hstore.compactionThreshold</name>
    <value>3</value>
</property>

<!--表示一次minor compaction中最多选取10个store file-->
<property>
    <name>hbase.hstore.compaction.max</name>
    <value>10</value>
</property>

<!--默认值为128m,
表示文件大小小于该值的store file 一定会加入到minor compaction的store file中
-->
<property>
    <name>hbase.hstore.compaction.min.size</name>
    <value>134217728</value>
</property>

<!--默认值为LONG.MAX_VALUE,
表示文件大小大于该值的store file 一定会被minor compaction排除-->
<property>
    <name>hbase.hstore.compaction.max.size</name>
    <value>9223372036854775807</value>
</property>

4.3.2 major compaction big merger

  • Store all merge into one HFile HFile

    StoreFile all merged into one StoreFile, this process will clean up three meaningless data: deleted data, TTL expired data, the version number than the version number of the set of data. The combined frequency is relatively low, the default seven days once, performance and consume very large, it is recommended to close production (set to zero), manually trigger the application idle time. Generally can be controlled manually merge, to prevent during peak periods.

  • major compaction conditions trigger time

    <!--默认值为7天进行一次大合并,-->
    <property>
    <name>hbase.hregion.majorcompaction</name>
    <value>604800000</value>
    </property>
  • Manual trigger

    ##使用major_compact命令
    major_compact tableName

    <property>
    <name>hbase.hregion.majorcompaction</name>
    <value>604800000</value>
    </property>

  • Manual trigger

    ##使用major_compact命令
    major_compact tableName

Guess you like

Origin blog.51cto.com/10312890/2471004