[] HBase HBase design of RK, avoid hot spots

 

A, HBase design of RK

HBase mostly read and write data through RK, MemStore / HFile RK stored is stored according to the order of the dictionary, so attention RK.

 

RowKey design principles:

1) Principle length:

RowKey should not exceed 16 bytes, because if it is too long and then stored at KV, for HFile and MemStore it will greatly take up storage space.

2) The only principle:

Ensure the uniqueness of RowKey, if HBase to insert the same RowKey the same table data, the pre-existing data will be overwritten with new data

3) ordering principle:

RowKey is sorted according to the dictionary order. The HBase data will always be sorted according to the dictionary RowKey to sort.

4) Hash principles:

RowKey designed to be evenly distributed over the respective HBase nodes. RegionServer can load balancing, or prone to the accumulation of all the new data on a RegionServer phenomenon.

 

Two, HBase how to avoid hot spots

HBase data sheet is in accordance with RowKey come into different Region, unreasonable RowKey design can lead to hot spots, hot spots is a large number of clients to directly access the cluster or a very small number of nodes, while other nodes in the cluster, but in a relatively idle state, thus affecting the read and write performance of HBase.

1, salt

Adding a random number plus a fixed length prefix in front of RK. It allows data to be spread over different Regin.

Disadvantages: increased read overhead.

2、hash

Use all hash (RK) or take only the length of the hash value of the first 4 bits of the new composition of the RowKey + rk, where said hash containing MD5, sha1, sha256, sha512 algorithms, not limited to Java Hash value is calculated.

Disadvantages: The same is not conducive to reading.

3, reverse reverse


4, reverse time stamp


 

Field selection:

Certainly depends on your greatest needs, combined with the specific query criteria, high frequency as much as possible into the inside of RK, existing as two and four kinds of data requirements, how to design RowKey?

userid OrderNo skuname skuprice skunum skusum ordercretime
Jepson西瓜0001 5 50 10 07.07.2019 12:00:00
Jepson南瓜0002 50 500 10 08/07/2019 12:00:00

Demand #
1) according to the user's query orders to-date records
where userid = jepson order by ordercretime desc limit 1

2)    
where userid=jepson and (ordercretime>='xxx' and ordercretime<='xxxx')

3) The order record query period
where (ordercretime> = 'xxx' and ordercretime <= 'xxxx')

4) The user orders to buy a recording watermelon
where userid = jepson and skuname = 'watermelon'

According to the above method and the principle of its summary, RowKey = hash (userid) .substring (0, 4) + userid + (Long.Max_Value - timestamp), but note (Long.Max_Value - timestamp) for a fixed length with 0 filled.

example:


最终的rowkey=hash(UserId).substring(0, 4)+UserId+Long.Max_Value - timestamp

 

Tuning (region number):
a region memstore additional overhead hbase.hregion.memstore.mslab.chunksize = 2m, if you have a table of 20 region, then the overhead is 40M, one hundred table is 100 * 40M = 4G. It is recommended that the number of the small region of the table 1, the number of table region 5, a large table for the region node is 20,1 rs station 100-200.

Guess you like

Origin www.cnblogs.com/huomei/p/12112794.html