HBase Advanced


RegionServer architecture

RegionServer detailed architecture:
Insert picture description here

  1. StoreFile
    stores the physical files of actual data, and StoreFile is stored on HDFS in the form of Hfile. Each Store has one or more StoreFiles (HFile), and the data is ordered in each StoreFile.
  2. MemStore
    write cache. Since the data in HFile is required to be ordered, the data is stored in MemStore first. After sorting, it will be flashed to HFile when the flashing time is reached. Each flashing will form a new one. HFile.
  3. WAL
    data must be sorted by MemStore before being flushed to HFile, but storing the data in memory has a high probability of causing data loss. To solve this problem, the data will be written in a file called Write-Ahead logfile. , And then write it to MemStore. So when the system fails, the data can be reconstructed through this log file.
    Every interval hbase.regionserver.optionallogflushinterval (default 1s), HBase will write the operation from memory to WAL.
    All Regions on a RegionServer share a WAL instance.
    The WAL check interval is defined by hbase.regionserver.logroll.period, and the default value is 1 hour. The content of the check is to compare the operations in the current WAL with the operations that are actually persisted on HDFS, to see which operations have been persisted, and the persisted operations will be moved to the .oldlogs folder (this folder is also On HDFS). A WAL instance contains multiple WAL files. The maximum number of WAL files is defined by the hbase.regionserver.maxlogs (default is 32) parameter.
  4. ) BlockCache
    reads the cache, and the data queried each time will be cached in BlockCache, which is convenient for the next query.

HBase writing process

Insert picture description here
Writing process:

  1. The client first accesses zookeeper to obtain which Region Server the hbase:meta table is located in.
  2. Access the corresponding Region Server, obtain the hbase:meta table, and query which Region in which Region Server the target data is located according to the namespace:table/rowkey of the read request. The region information of the table and the location information of the meta table are cached in the meta cache of the client to facilitate next access.
  3. Communicate with the target Region Server;
  4. Write (append) data sequentially to WAL;
  5. Write the data to the corresponding MemStore, and the data will be sorted in the MemStore;
  6. Send ack to the client;
  7. After reaching the flashing time of MemStore, flash the data to HFile.

HBase reading process

Insert picture description here
Reading process

  1. The client first accesses zookeeper to obtain which Region Server the hbase:meta table is located in.
  2. Access the corresponding Region Server, obtain the hbase:meta table, and query which Region in which Region Server the target data is located according to the namespace:table/rowkey of the read request. The region information of the table and the location information of the meta table are cached in the meta cache of the client to facilitate next access.
  3. Communicate with the target Region Server;
  4. Query the target data in Block Cache (read cache), MemStore and Store File (HFile), and merge all the data found. All data here refers to different versions (time stamp) or different types (Put/Delete) of the same piece of data.
  5. Cache the queried data block (Block, HFile data storage unit, the default size is 64KB) to the Block Cache.
  6. The final result after the merger is returned to the client.

MemStore Flush

Insert picture description here
The meaning of MemStore is to organize the data in an orderly manner before writing to HDFS.

MemStore flash timing:

  1. When the size of a memstore reaches hbase.hregion.memstore.flush.size (the default value is 128M), all memstores in the region will be flushed. When the size of memstore is reached

    When hbase.hregion.memstore.flush.size (default value 128M)* hbase.hregion.memstore.block.multiplier (default value 4)
    , it will prevent continuing to write data to the memstore.

  2. When the total size of memstore in the region server reaches

    java_heapsize * hbase.regionserver.global.memstore.size (default value 0.4) * hbase.regionserver.global.memstore.size.lower.limit (default value 0.95)

    The region will be flashed in order of the size of all its memstores (large to small). Until the total size of all memstores in the region server is reduced below the above value.

  3. When the total size of memstore in the region server reaches

    java_heapsize * hbase.regionserver.global.memstore.size (default value 0.4)
    , it will prevent writing data to all memstores.

  4. When the time for automatic flushing is reached, memstore flush will also be triggered. The automatic refresh interval is configured by this property hbase.regionserver.optionalcacheflushinterval (default is 1 hour).

  5. When the number of WAL files exceeds hbase.regionserver.max.logs, the region will be flushed in chronological order until the number of WAL files is reduced to below hbase.regionserver.max.log (this attribute name is obsolete and there is no need to manually set it now , The maximum value is 32).


StoreFile Compaction

Insert picture description here

  • Since Hbase relies on HDFS storage, HDFS only supports additional writes. Therefore, when a new cell is added, HBase adds a new piece of data on HDFS. When modifying a cell, HBase adds another data to HDFS, but the version number is larger than the previous one (or customized). When deleting a cell, HBase still adds a new piece of data! It’s just that this piece of data has no value, and the type is DELETE, also known as tombstone mark (Tombstone)
  • HBase performs a compaction every time interval, and the merged object is an HFile file. The merger is divided into two types of minor compaction and major compaction.
  • When HBase performs major compaction, it merges multiple HFiles into one HFile. In this process, once a record marked with a tombstone is detected, this record is ignored during the merge process. In this way, in the newly generated HFile, there is no such record, which is naturally equivalent to being really deleted
  • Since memstore generates a new HFile each time it is flashed, and different versions (timestamp) and different types (Put/Delete) of the same field may be distributed in different HFiles, it is necessary to traverse all HFiles when querying. In order to reduce the number of HFiles and clean up expired and deleted data, StoreFile Compaction will be performed.
  • Compaction is divided into two types, namely Minor Compaction and Major Compaction. Minor Compaction will merge several adjacent smaller HFiles into one larger HFile, but will not clean up expired and deleted data. Major Compaction will merge all HFiles in a Store into one large HFile, and will clean up expired and deleted data.

Region Split

By default, each Table has only one Region at the beginning. As data is continuously written, the Region will automatically split. When splitting, the two sub-regions are located in the current Region Server, but for load balancing considerations, HMaster may transfer a Region to another Region Server.

Timing of Region Split:

  1. When the total size of all StoreFiles in a Store in a region exceeds hbase.hregion.max.filesize, the region will be split (before version 0.94).
  2. The splitting strategy after version 0.94
    uses IncreasingToUpperBoundRegionSplitPolicy strategy to split the region by default, getSizeToCheck() checks the size of the region to determine whether the cutting and cutting conditions are met.
protected long getSizeToCheck(final int tableRegionsCount) {
    
    
    // safety check for 100 to avoid numerical overflow in extreme cases
    return tableRegionsCount == 0 || tableRegionsCount > 100
               ? getDesiredMaxFileSize()
               : Math.min(getDesiredMaxFileSize(),
                          initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);
  }

tableRegionsCount: the number of regions belonging to the Table in the current Region Server.
getDesiredMaxFileSize() This value is the value of the hbase.hregion.max.filesize parameter, and the default is 10GB.
Initialization of initialSize is more complicated and determined by multiple parameters.

@Override
  protected void configureForRegion(HRegion region) {
    
    
    super.configureForRegion(region);
Configuration conf = getConf();
//默认hbase.increasing.policy.initial.size 没有在配置文件中指定
    initialSize = conf.getLong("hbase.increasing.policy.initial.size", -1);
    if (initialSize > 0) {
    
    
      return;
}
// 获取用户表中自定义的memstoreFlushSize大小,默认也为128M
    HTableDescriptor desc = region.getTableDesc();
    if (desc != null) {
    
    
      initialSize = 2 * desc.getMemStoreFlushSize();
}
// 判断用户指定的memstoreFlushSize是否合法,如果不合法,则为hbase.hregion.memstore.flush.size,默认为128. 
    if (initialSize <= 0) {
    
    
      initialSize = 2 * conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,
                                     HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE);
    }
  }

The specific segmentation strategy is that tableRegionsCount is between 0 and 100, then
initialSize (default is 2*128) * tableRegionsCount^3, for example:
the first split: 1^3 * 256 = 256MB the
second split: 2^ 3 * 256 = 2048MB The
third split: 3^3 * 256 = 6912MB The
fourth split: 4^3 * 256 = 16384MB> 10GB, so take the smaller value 10GB
and the size of each split is 10GB.
If the tableRegionsCount exceeds 100, the region will be divided if it exceeds 10GB.

Insert picture description here

hbase.regionserver.region.split.policy:

Guess you like

Origin blog.csdn.net/qq_43081842/article/details/112747784