Analysis of HBase MetaStore&Compaction of HBase Principle

1 Overview

The client reads and writes data by first obtaining the metadata information of the RegionServer from HBase Clienr, such as Region address information. When performing data write operations, HBase will write MetaStore first. Why does it write to MetaStore? This article will analyze the detailed content of HBase MetaStore and Compaction for readers.


2. Content

The internal communication and data interaction of HBase is realized through RPC. The next blog about HBase's RPC implementation mechanism will be shared for everyone. The client application calls the HBase server's write, delete, and read requests through RPC, and the HBase Master allocates the corresponding RegionServer for processing, obtains the Region address in each RegionServer, writes it to the HFile file, and finally performs the data Persistence.

Before understanding HBase MetaStore, we can first take a look at the architecture of RegionServer. Its structure diagram is as follows: 

image

In HBase storage, although Region is the smallest unit of distributed storage, it is not the smallest unit of storage. It can be seen from the figure that, in fact, a Region is composed of one or more Stores, and each Store stores a Column Family. Each Store is composed of a MemStore and 0 or more StoreFiles, and StoreFile is finally stored in HDFS in HFile format.


2.1 Writing process

In order to ensure the random read performance of data, HBase stores the RowKey in HFile according to the order, that is, order. After the client's request arrives at the RegionServer, in order to ensure the orderliness of the RowKey, HBase will not write the data to the HFile immediately, but save the data of each executed action in the memory, that is, the MetaStore. MetaStore can easily random write compatible operations, and ensure that all data stored in the memory is in order. When MetaStore reaches the threshold, HBase will trigger the Flush mechanism to flush the data in MetaStore to HFile, so that it can take full advantage of the performance advantages of HDFS for writing large files and provide data write performance. The complete writing process is shown in the figure:

image

Since MetaStore is stored in memory, if the RegionServer fails or the process goes down, the data in the memory will be lost. In order to ensure data integrity, HBase adds a WAL mechanism to the storage design. Whenever HBase has an update operation to write data to MetaStore, it will be written to WAL (abbreviation of Write AHead Log). WAL files will be written through appending and sequential writing. Each RegionServer of WAL has only one, and all Regions on the same RegionServer are written into the same WAL file. In this way, even if a RegionServer goes down, all data can be reloaded into the content in order through the WAL file.


2.2 Reading process

HBase queries to obtain data through RowKey, and the client application obtains its corresponding Region address according to the corresponding RowKey. The address information of the region is obtained through the metadata table of HBase, that is, the region where the hbase:meta table is located. By reading the hbase:meta table, you can find the StartKey, EndKey and the RegionServer to which each Region belongs. Because the RowKey of HBase is distributed in the Region in an orderly manner, the Region address of the RowKey currently being operated is determined by the StartKey and EndKey of each Region.

Since scanning the hbase:meta table is time-consuming, the client will store the Region address information of the table. When the requested Region lease expires, the Region address information of the table will be reloaded.


2.3 Flush mechanism

The RegionServer writing data to HFile does not happen synchronously, it is triggered when the memory of MetaStore reaches the threshold. After the memory usage of the MetaStore of all Regions in the RegionServer reaches the total memory usage, all data in the MetaStore will be written to the HFile. At the same time, it will record and write the sequence ID of the data, which is convenient for the WAL log cleaning mechanism to delete WAL's useless logs regularly.

After the MetaStore size reaches the threshold, it will be flushed to the disk. The key parameter is configured by the hbase.hregion.memstore.flush.size property, and the default is 128MB. When flushing, it will not flush to the disk immediately, and there will be a detection process. It is implemented through the MemStoreFlusher class, and the specific source code is implemented:

private boolean flushRegion(final FlushRegionEntry fqe) {
    HRegion region = fqe.region;
    if (!region.getRegionInfo().isMetaRegion() &&
        isTooManyStoreFiles(region)) {
      if (fqe.isMaximumWait(this.blockingWaitTime)) {
        LOG.info("Waited " + (EnvironmentEdgeManager.currentTime() - fqe.createTime) +
          "ms on a compaction to clean up 'too many store files'; waited " +
          "long enough... proceeding with flush of " +
          region.getRegionNameAsString());
      } else {
        // If this is first time we've been put off, then emit a log message.
        if (fqe.getRequeueCount() <= 0) {
          // Note: We don't impose blockingStoreFiles constraint on meta regions
          LOG.warn("Region " + region.getRegionNameAsString() + " has too many " +
            "store files; delaying flush up to " + this.blockingWaitTime + "ms");
          if (!this.server.compactSplitThread.requestSplit(region)) {
            try {
              this.server.compactSplitThread.requestSystemCompaction(
                  region, Thread.currentThread().getName());
            } catch (IOException e) {
              LOG.error(
                "Cache flush failed for region " + Bytes.toStringBinary(region.getRegionName()),
                RemoteExceptionHandler.checkIOException(e));
            }
          }
        }

        // Put back on the queue.  Have it come back out of the queue
        // after a delay of this.blockingWaitTime / 100 ms.
        this.flushQueue.add(fqe.requeue(this.blockingWaitTime / 100));
        // Tell a lie, it's not flushed but it's ok
        return true;
      }
    }
    return flushRegion(region, false, fqe.isForceFlushAllStores());
  }


From the perspective of the implementation method, if it is a MetaRegion, Flush will be performed immediately, because the Meta Region has a high priority. In addition, determine whether there are too many StoreFiles. This StoreFile is generated every time MemStore Flush, and every Flush will generate one StoreFile, so there will be multiple StoreFiles in the Store, that is, HFile.

In addition, Flush will also be checked in HRegion, which is implemented through the checkResources() method. The specific source code is implemented:

private void checkResources() throws RegionTooBusyException {
    // If catalog region, do not impose resource constraints or block updates.
    if (this.getRegionInfo().isMetaRegion()) return;

    if (this.memstoreSize.get() > this.blockingMemStoreSize) {
      blockedRequestsCount.increment();
      requestFlush();
      throw new RegionTooBusyException("Above memstore limit, " +
          "regionName=" + (this.getRegionInfo() == null ? "unknown" :
          this.getRegionInfo().getRegionNameAsString()) +
          ", server=" + (this.getRegionServerServices() == null ? "unknown" :
          this.getRegionServerServices().getServerName()) +
          ", memstoreSize=" + memstoreSize.get() +
          ", blockingMemStoreSize=" + blockingMemStoreSize);
    }
  }


The memstoreSize in the code represents the total size of all MemStores in a Region, and the settlement formula for the total size is:

BlockingMemStoreSize = hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier

Among them, hbase.hregion.memstore.flush.size defaults to 128MB, and hbase.hregion.memstore.block.multiplier defaults to 4, which means that when the total size of all MemStores in the entire Region exceeds 128MB * 4 = 512MB, It will start the Flush mechanism. This avoids too much data in the memory.


3. Compaction

随着HFile文件数量的不断增加,一次HBase查询就可能会需要越来越多的IO操作,其 时延必然会越来越大。因而,HBase设计了Compaction机制,通过执行Compaction来使文件数量基本保持稳定,进而保持读取的IO次数稳定,那么延迟时间就不会随着数据量的增加而增加,而会保持在一个稳定的范围中。

然后,Compaction操作期间会影响HBase集群的性能,比如占用网络IO,磁盘IO等。因此,Compaction的操作就是短时间内,通过消耗网络IO和磁盘IO等机器资源来换取后续的HBase读写性能。

因此,我们可以在HBase集群空闲时段做Compaction操作。HBase集群资源空闲时段也是我们清楚,但是Compaction的触发时段也不能保证了。因此,我们不能在HBase集群配置自动模式的Compaction,需要改为手动定时空闲时段执行Compaction。

Compaction触发的机制有以下几种:

  • 自动触发,配置hbase.hregion.majorcompaction参数,单位为毫秒

  • 手动定时触发:将hbase.hregion.majorcompaction参数设置为0,然后定时脚本执行:echo "major_compact tbl_name" | hbase shell

  • 当选中的文件数量大于等于Store中的文件数量时,就会触发Compaction操作。由属性hbase.hstore.compaction.ratio决定。

至于Region分裂,通过hbase.hregion.max.filesize属性来设置,默认是10GB,一般在HBase生产环境中设置为30GB。


4.总结

When doing the compaction operation, if the data traffic volume is large, the frequency of the timing compaction can be set to be shorter, for example: do a compaction on all tables of HBase during the free period of the early morning every day to prevent the data from being written during the busy period during the day. Large, triggers the Compaction operation, occupying the HBase cluster network IO, disk IO and other machine resources.


image


Guess you like

Origin blog.51cto.com/15060465/2677708