RowKey Design of HBase for Big Data Performance Tuning

HBase's RowKey Design for Big Data Performance

Tuninghttp:

//www.open-open.com/lib/view/open1417612091323.html 2.1.1 Rowkey Length Principle

Rowkey is a binary code stream, and the length of Rowkey is suggested by many developers It is said that the design is 10~100 bytes, but it is recommended that the shorter the better, not more than 16 bytes.

The reasons are as follows:

(1) The data persistence file HFile is stored according to KeyValue. If the Rowkey is too long, such as 100 bytes, 10 million columns of data will occupy 100*10 million = 1 billion bytes, which is nearly 1G data, which will greatly affect the storage efficiency of HFile;

(2) MemStore will cache part of the data to the memory, if the Rowkey field is too long, the effective utilization of the memory will be reduced, and the system will not be able to cache more data, which will reduce retrieval. effectiveness. Therefore, the shorter the byte length of the Rowkey, the better.

(3) The current operating systems are all 64-bit systems, and the memory is 8-byte aligned. The control is in 16 bytes, and the integer multiple of 8 bytes utilizes the best features of the operating system.
2.1.2 Rowkey Hash Principle

If the Rowkey is incremented by timestamp, do not put the time in front of the binary code. It is recommended to use the high bit of the Rowkey as the hash field, which is generated by the program cyclically, and the time field in the low bit, which will improve the The probability that the data is evenly distributed in each Regionserver to achieve load balancing. If there is no hash field, the first field is directly time information, which will generate a hotspot phenomenon in which all new data is accumulated on one RegionServer, so that the load will be concentrated on individual RegionServers during data retrieval, reducing query efficiency.
2.1.3 Rowkey's unique principle The uniqueness

must be guaranteed by design.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326688584&siteId=291194637