HBase optimization strategy
- Settlement of hotspot effect
HBase data cause of hot issues:
a large number of read and write user requests access to HBase cluster of one or a few RegionServer, resulting in RegionServer load pressure surge may cause a decline in RegionServer performance, resulting in more severe cases, hang Service;
- When you create pre-partition table, not in accordance with the default policy, create a table only Region, but if necessary, create multiple Region as a table to avoid hot spots effect
- Based on the pre-partition-based pre-partition Rowkey
- 语法:
3.1 create ‘t1’, ‘f1’, SPLITS => [‘10’, ‘20’, ‘30’, ‘40’]
3.2 create ‘t1’, ‘f1’, SPLITS_FILE => ‘splits.txt’
splits.txt
10
20
30
40
3.3 create ‘t2’, ‘f1’, {NUMREGIONS => 15, SPLITALGO => ‘HexStringSplit’}- Settlement of hotspot issues require attention to the following fundamental
- Pre-partition
- rowkey Set To sum up two persons try to solve the hot issues.
- Improve retrieval efficiency
- rowkey then retrieve a relatively continuous high efficiency constant (query sequence of scan operations)
- Memstore set size, Block Cache size hbase-site.xml provided
hbase.hregion.memstore.flush.size 128M each memstore reached 40% 128M flush hbase.regionserver.global.memstore.size 0.4 heap space
(JVM space occupied RegionServer )
- Let the data as much as possible placed in memory, improve retrieval efficiency
- Memstore avoid blocking client flush operation to flush when the global hbase.regionserver.global.memstore.size.lower.limit
memstore with a capacity of 95% is not flush
hfile.block.cache.size 0.4- The index of the block data inside hbase, Bloom filter
- JVM parameters
JVM Java process
JVM (heap space) HBase Cenozoic 1/3 years old 2/3 permanent generation (static, constant) Eden Survivor (from) Survivor (to)
8 1 1 ParNewGC ConcMarkSweepGC "-Xmx8g -Xms8G -Xmn128m -XX: UseParNewGC
-XX : UseConcMarkSweepGC - XX: CMSInitiatingOccupancyFraction = 70 -verbose: gc -XX: + PrintGCDetails -XX: + PrintGCTimeStamps -Xloggc: $ HBASE_HOME / logs / gc - {hostname} $ -hbase.log "hbase-env.sh export HBASE_REGIONSERVER_OPTS=”-Xmx8g -Xms8G
-Xmn128m -XX:UseParNewGC -XX:UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$ HBASE_HOME/logs/gc-${hostname}-hbase.log”
- mslab
- Prevent memory fragmentation, excessive memory fragmentation, memory leaks, FullGC occurred STW hbase.hregion.memstore.mslab.enabled true whether to enable MSLAB, default to true.
Hbase.hregion.memstore.mslab.chunksize 2M -> 4,5M 6M
Chunk size 2MB default
- Functional automation process changes to manual processing
Timing binding, shell scripts to complete the processing hbase tools manual compact split
RowKey Design
Design principles: unique, orderly, length, hash
The only principle
HBase RowKey uniquely identifies the data line, we must ensure that no duplicate unique;
Ordered principle
RowKey automatically sorted in alphabetic order; for example: Air barrage may be designed直播间ID:timestamp
Length principle
The maximum allowed Rowkey 64 bytes, is recommended to set within 16 bytes;
50 bytes * 100 million records ≈ 4GB
- It will result in a waste of memory resources
- The impact MemStore effective storage space
Hash principle
Dispersing the plurality of data stored in the storing HBase RegionServer; hot spots prevent data;
- Multi region query, suggestion rowkey continuous (ordered principles)
- Small region query, hash hash -> Encryption, UUID
DigestUtils.md5Hex (rowkey); encryption
String RowKey = "yxx_male_151";
. = UUID UUID.randomUUID () toString ()
String newRowKey = RowKey + "" + uuid.subString () can take several