Some optimization methods HBase work

1, table design
  • Pre-creating Regions (pre-partitioning)
    •   By default, automatically created when you create a region Hbase table partition, when importing data, all the Hbase clients to write data to a region which, until this region is large enough and we have to be segmented. A way to speed up the bulk write speed is through pre-created some of the empty regions, so that when data is written Hbase, will follow the region partitioning, load balancing data in the cluster.
  • rowkey: Hbase in rowkey used to retrieve the records in the table to support what three ways
    • Ie get follow a rowkey key: access via a single rowkey
    • Scan is performed by the range rowkey: to be scanned within this range by startRowkey and endRowkey
    • Full table scan: All rows scan the entire table that is directly
    • In the rowkey Hbase can be any string, the maximum length of 64K, generally 10 ~ 100bytes, general design for the fixed-length
    • rowkey rules
      • The smaller the better
      • rowkey is designed to be based on actual business
      • Hash of:
        • Negate
        • Hash
  • column family
    •   Do a lot of table column family is defined in Hbase. Hbase currently not well handle more than 2 to 3 column family table. Because a column family at the time of the flush, it will be because of the neighboring column family related effect starting flush, eventually causing the system to produce more I / O.
  • In memory: When creating a table, by HColumnDescriptor.setInMemory (true) in the RS table into the cache, is to ensure that when read hit chache
  • Max version: When you create a table, you can HColumnDescriptor.setMaxVersions maximum version (int maxVersions) set of data in the table, if only to save the latest version of the data, you can set setMaxVersions (1)
  • Time to live: When you create a table, you can set the data in the table storage lifecycle, state-owned enterprise data will be automatically deleted by HColumnDescriptor.setTimeToLive (int timeToLive), for example, if only need to store the last two days of data, you can set setTimeToLive (2 * 24 * 60 * 60)
  • Compact & split:
    • In Hbase, the data is first written when updating WAL log (HLog) and memory (MemStore), the data is sorted Memstore, when memstore accumulated to a certain threshold, it will create a new Memstore, and Add to flush the old Memstore right, flush by a separate thread to disk, to become a StoreFile. At the same time, the system will record a redo point in the zookeeper, represents the change before this time has been the persistence of (minor compact).
    • StoreFile is read-only, once created, it can not be modified in. So in fact, it is constantly updated Hbase additional operations. When a store is StoreFile reaches a certain threshold, it will perform a merge (major compact), modified with a key will merge together to form a large StoreFile, when the size reaches a certain threshold StoreFile, It returned to storeFile split (split), divided into two storeFile.
    • Since the table is constantly updated on the additional processing read requests that need access to all storeFile and memstore store in their merger in accordance with rowkey, due storeFile and Memstore are sorted, indexed and storeFile with memory, typically consolidation process is relatively fast
    • In practice, major compact manually, if necessary, will be combined to form a larger storeFile same modification can be considered a rowkey. While storeFile set bigger, reduce the incidence of split
    • Hbase order to prevent small files (to brush disk memstore) too much, in order to ensure the efficiency of the query, Hbase When necessary they will be merged into these small storeFile relatively large storeFile, this process is called compact. In Hbase mainly there are two types of compact: minor compaction and major compaction
      • minor compaction: smaller, rarely merge files
      • major compaction: storeFile all merged into one, could trigger major compaction conditions are: major_compact command, majorCompact () API, RS automatic operation (parameters: hbase.hregion.majorcompaction default is 24 hours, hbase.hregion.majorcompaction. 0.2 jetter default, to prevent major compaction RS, at the same time)
      • hbase.hregion.majorcompaction.jetter effect: floating functions hbase.hregion.majorcompaction predetermined parameter value, if two parameters are the default values ​​0.2 and 24, then the major compact final value is used: 19.2 - 28.8 This range
    • Turn off Automatic major compaction
    • Manual programming major compaction
    • minor compaction operation mechanism more complicated, it is determined by what a few parameters:
    • hbase.hstore.compaction.min : The default is 3, expressed the need to meet at least three conditions store file, minor compaction will not start
    • hbase.hstore.compaction.max default value is 10, represents a minor compaction up to 10 selected store file
    • hbase.hstore.compaction.min.size indicates that the file size is smaller than the value store file will be added to the store file in minor compaction
    • hbase.hstore.compaction.max.size indicates that the file size is greater than the value store file will be excluded minor compaction
    • hbase.hstore.compaction.ratio will store file sort files according to the age (older to younger), minor compaction always start select from the older store file
2, the write operation table
  • Multiple concurrent write HTable
  • HTable parameter settings
    • Auto flush: HTable.setAutoFlush (false), turn off the automatic flush client, so you can write data to Hbase batch, rather than have an update on the implementation of a put, put only when the client fills write cache, only the actual Hbase write server launch appeal. The default auto flush is turned on
    • write buffer: set the buffer size of the client, if the newly established buffer is less than the current write data in the buffer, buffer will be flush to the server
    • WAL Flag
      •   Note: careful selection of close WAL log, because of this, if the RS is down, put / delete data will not be able to recover according to WAL log
    • Batch writing
    • Write multi-threaded 
3, a table read
  • scan caching
    • Hbase the configuration file is conf
    • Configured by calling HTable.setScannerCaching ()
    • Configured by calling scan.setCaching ()
    • More and more senior of the three priority
  • Batch Reading
  • Multi-threaded reading 
  • Cache query results
  • block cache
    • The memory is divided into two portions RS on HBase, as part memstore, primarily used to write, while a portion BlockCache, mainly for reading
    • Write request will first write memstore, RS will give each region a memstore, When memstore full 64M, will start flush flushed to disk. When (heapsize * hbase.regionserver.global.memstore.upperlimit * 0.9) memstore total size exceeds the limit, will be forced to start the flush process, from the largest memstore start until flush below the limit
    • First memstore read request to search the data, finding out the check on the BlockCache, then it will not find the disk to read, and puts the result BlockCache. After due BlockCache using LRU strategy, therefore BlockCache achieve on-line (heap size * hfile.block.cache.size * 0.85), will start the elimination mechanism, eliminate the oldest batch of data
    • There is a one RS BlockCache and N Memstore, and their size is not greater than or equal heapsize * 0.8, or Hbase not start. The default is BlockCache 0.2, memstore 0.4. For a read response time focusing system, BlockCache may be set larger, such BlockCache = 0.4, memstore = 0.39, in order to increase the cache hit rate 

Guess you like

Origin www.cnblogs.com/liufei-yes/p/11520801.html
Recommended