HBase optimization strategy design and RowKey

HBase optimization strategy

  • Settlement of hotspot effect

HBase data cause of hot issues:
a large number of read and write user requests access to HBase cluster of one or a few RegionServer, resulting in RegionServer load pressure surge may cause a decline in RegionServer performance, resulting in more severe cases, hang Service;

  1. When you create pre-partition table, not in accordance with the default policy, create a table only Region, but if necessary, create multiple Region as a table to avoid hot spots effect
  2. Based on the pre-partition-based pre-partition Rowkey
  3. 语法:
    3.1 create ‘t1’, ‘f1’, SPLITS => [‘10’, ‘20’, ‘30’, ‘40’]
    3.2 create ‘t1’, ‘f1’, SPLITS_FILE => ‘splits.txt’
    splits.txt
    10
    20
    30
    40
    3.3 create ‘t2’, ‘f1’, {NUMREGIONS => 15, SPLITALGO => ‘HexStringSplit’}
  4. Settlement of hotspot issues require attention to the following fundamental
    1. Pre-partition
    2. rowkey Set To sum up two persons try to solve the hot issues.
  • Improve retrieval efficiency
  1. rowkey then retrieve a relatively continuous high efficiency constant (query sequence of scan operations)
  2. Memstore set size, Block Cache size hbase-site.xml provided
    hbase.hregion.memstore.flush.size 128M each memstore reached 40% 128M flush hbase.regionserver.global.memstore.size 0.4 heap space
    (JVM space occupied RegionServer )
    1. Let the data as much as possible placed in memory, improve retrieval efficiency
    2. Memstore avoid blocking client flush operation to flush when the global hbase.regionserver.global.memstore.size.lower.limit
      memstore with a capacity of 95% is not flush
      hfile.block.cache.size 0.4
  3. The index of the block data inside hbase, Bloom filter
  • JVM parameters
  1. JVM Java process

  2. JVM (heap space) HBase Cenozoic 1/3 years old 2/3 permanent generation (static, constant) Eden Survivor (from) Survivor (to)
    8 1 1 ParNewGC ConcMarkSweepGC "-Xmx8g -Xms8G -Xmn128m -XX: UseParNewGC
    -XX : UseConcMarkSweepGC - XX: CMSInitiatingOccupancyFraction = 70 -verbose: gc -XX: + PrintGCDetails -XX: + PrintGCTimeStamps -Xloggc: $ HBASE_HOME / logs / gc - {hostname} $ -hbase.log "

    hbase-env.sh export HBASE_REGIONSERVER_OPTS=”-Xmx8g -Xms8G
    -Xmn128m -XX:UseParNewGC -XX:UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$ HBASE_HOME/logs/gc-${hostname}-hbase.log”

  • mslab
  1. Prevent memory fragmentation, excessive memory fragmentation, memory leaks, FullGC occurred STW hbase.hregion.memstore.mslab.enabled true whether to enable MSLAB, default to true.
    Hbase.hregion.memstore.mslab.chunksize 2M -> 4,5M 6M
    Chunk size 2MB default
  • Functional automation process changes to manual processing

Timing binding, shell scripts to complete the processing hbase tools manual compact split

RowKey Design

Design principles: unique, orderly, length, hash

The only principle

HBase RowKey uniquely identifies the data line, we must ensure that no duplicate unique;

Ordered principle

RowKey automatically sorted in alphabetic order; for example: Air barrage may be designed直播间ID:timestamp

Length principle

The maximum allowed Rowkey 64 bytes, is recommended to set within 16 bytes;

50 bytes * 100 million records ≈ 4GB

  • It will result in a waste of memory resources
  • The impact MemStore effective storage space

Hash principle

Dispersing the plurality of data stored in the storing HBase RegionServer; hot spots prevent data;

  • Multi region query, suggestion rowkey continuous (ordered principles)
  • Small region query, hash hash -> Encryption, UUID

DigestUtils.md5Hex (rowkey); encryption
String RowKey = "yxx_male_151";
. = UUID UUID.randomUUID () toString ()
String newRowKey = RowKey + "" + uuid.subString () can take several

Published 24 original articles · won praise 1 · views 497

Guess you like

Origin blog.csdn.net/Mr_YXX/article/details/105025205