HBase hot issues --rowkey hash partitioning and pre-design

Hot spots occur in a large number of client direct access to a cluster or a very small number of nodes (access may be read, write or other operations). A single machine will make a large number of hot spot region where access beyond their capacity, performance degradation or even region are not available, this will also affect other region on the same RegionServer, because the host is unable to service requests from other region, resulting in waste of resources. Well-designed data access patterns so that the cluster is full, balanced use. 
Data tilt: Hbase can be divided into a number Region, but only one Region created by default when distributed on a cluster node, the data at the outset are concentrated in the Region, which is focused on this node, even if the store reaches the region when the threshold value is divided, the data is stored on a few nodes. This is data skew

Both random hash with pre-partition together, is more perfect . Pre-partition start pre-built part of the region, the region maintains its own start-end keys, random hash over with, write hit these pre-built data region can be balanced and be able to solve the above drawbacks , providing performance greatly.

1. Pre-partition

Overview of pre-partitioning 1.1 HBase

Default partition:

When HBase table is created, only one Region, when a Region is too large to reach the default threshold (default 10GB size), HBase in the Region will be split, split into two Region, and so on.

Disadvantages:

Table during the split time, will consume a lot of resources, the district frequently has a huge impact on the performance of HBase. So, HBase provides pre-partitioning feature, users can watch accordance with certain rules when creating the partition table.

2. HBase role of pre-partition

Avoid HBase often split, unnecessary consumption of resources, improve the performance of HBase

3. HBase pre-partitioning method

  • HBase Shell
create 'user1',{NAME=>'f'},{NAME=>'d'},SPLITS=>['0|','1|','3|','4|']
create 'user1', 'f', SPLITS => ['1|', '2|', '3|', '4|']
  • HBase Shell (by reading the split files)
create 'user2',{NAME=>'f'},{NAME=>'d'},SPLITS_FILE=>'/data/hbaseSplit.txt'
hbaseSplit.txt content
> cat hbaseSplit.txt
1|
2|
3|
4|
  • HBase Java API
object HbaseUtil { def main(args: Array[String]): Unit = { 
    val conf = HBaseConfiguration.create() 
conf.set("hbase.zookeeper.quorum","192.168.1.11,192.168.1.12,192.168.1.13") conf.set("hbase.zookeeper.property.clientPort", "2181") conf.set("zookeeper.znode.parent", "/hbase") conf.set("hbase.master", "192.168.1.11:16010") val connection = ConnectionFactory.createConnection(conf) val admin = connection.getAdmin val colFamily = List("info", "desc") val tableName = "user3" val splitKeys = Array( Bytes.toBytes("0|"), Bytes.toBytes("1|"), Bytes.toBytes("2|"), Bytes.toBytes("3|"), Bytes.toBytes("4|") ) if (admin.tableExists(TableName.valueOf(tableName))) { println("表已存在!") } else { val descriptor = new HTableDescriptor(TableName.valueOf(tableName)) colFamily.foreach(x => descriptor.addFamily(new HColumnDescriptor(x))) admin.createTable(descriptor, splitKeys) } admin.close() connection.close() } }

2. random hash

1. Fixed hash value

  • HBase Shell
create 'user1',{NAME=>'f'},{NAME=>'d'},SPLITS=>['0|','1|','3|','4|','5|','6|','7|','8|','9|']

Description: Fixed hash value, add back "|", because | the maximum value coding

2. hash (hash)

Hbase comes with two pre-split algorithms are HexStringSplit and UniformSplit.

  • HexStringSplit 

If our row key is a hexadecimal string as a prefix, called HexStringSplit is more suitable for use HexStringSplit, as a pre-split algorithm. For example, we use HexHash (prefix) prefix as a row key, where Hexhash to finally get a hexadecimal string hash algorithm, we usually manually specify SPLITS specify the pre-partition, we can also use our own split algorithm.

create 'test',{NAME=>'f',COMPRESSION=>'SNAPPY'},{NUMREGIONS => 30, SPLITALGO => 'HexStringSplit'}

 

When you can put to use

MD5Hash.getMD5AsHex(Bytes.toBytes(str));
  • UniformSplit

If our row key to use as a prefix byte, called UniformSplit, if a query table hbase just random query-based, can be used UniformSplit the way it is in accordance with the original byte value (from 0x00 ~ 0xFF) to the right 00 to fill. In this manner, the partition table is inserted when the need for a transformation rowkey skill, such as the original rowkey rawStr, it is necessary to take the hashCode thereof, and then follow the bit inversion in front of the first string rowkey . Bytes can take advantage of tools to do this.

create 'test', { NAME => 'f', TTL => 5184000, DATA_BLOCK_ENCODING => 'PREFIX' }, {NUMREGIONS => 128, SPLITALGO => 'UniformSplit'}

Use SPLITALGO => 'UniformSplit' mode to build the table is not specified and endKey startKey, that in this way is to build the table 256 based on the range of values ​​ancil averaged pre-segmentation 10 of the partition (since total anscil 256 value, so in this way do pre-built partition table supports up to 256 pre-partition, but after writing the data, 256 pre-partition can then do the second internal segmentation), this practice results in the Scan query when you need to take to open 256 threads scan scan data and returns the final result, the advantage is a unified whole range rowkey, named UniformSplit probably mean that.

When put the data, you can take advantage of this utility class Bytes

byte[] rowKey = Bytes.add(Bytes.toBytes(Integer.reverse(Integer.valueOf(Integer.valueOf(i).hashCode()))), Bytes.toBytes(i));

3. Forced split

HBase allows the client to enforce split, execute the following command in hbase shell in which:

 split 'forced_table', 'b' 

Wherein: forced_table is to split the table, 'b' is a split point

 

Guess you like

Origin www.cnblogs.com/yyy-blog/p/11887271.html