Principles of HBase

http://lxw1234.com/archives/2016/09/719.html

This one is good

Rowkey Design

Rowkey is the basis for HBase to realize distribution. HBase divides different regions through the range of rowkey. The basic requirement of a distributed system is that there should be no obvious hot spots in the access of the system at any time, so the design of rowkey is very important. Generally, we It is recommended that the beginning of the rowkey be hashed with hash or MD5, and try to ensure that the head of the rowkey is evenly distributed. It is forbidden to use time, user id and other signs with obvious segmentation phenomenon to be used directly as rowkey.

Column cluster design

When designing HBase tables, there are different choices according to different needs. For data tables that need to be queried online, try not to design multiple column clusters. We know that different column clusters are separated in storage, and multi-column cluster design will cause More files are read during data query, which consumes more I/O.

TTL design

Choosing an appropriate data expiration time is also a point to pay attention to in table design. HBase allows column clusters to define data expiration times. Once the data exceeds the expiration time, it can be cleaned up by major compact. A large amount of useless historical data remains, which will increase the size of the region and affect the query efficiency.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325388119&siteId=291194637