http://lxw1234.com/archives/2016/09/719.html
This one is good
Rowkey Design
Rowkey is the basis for HBase to realize distribution. HBase divides different regions through the range of rowkey. The basic requirement of a distributed system is that there should be no obvious hot spots in the access of the system at any time, so the design of rowkey is very important. Generally, we It is recommended that the beginning of the rowkey be hashed with hash or MD5, and try to ensure that the head of the rowkey is evenly distributed. It is forbidden to use time, user id and other signs with obvious segmentation phenomenon to be used directly as rowkey.
Column cluster design
When designing HBase tables, there are different choices according to different needs. For data tables that need to be queried online, try not to design multiple column clusters. We know that different column clusters are separated in storage, and multi-column cluster design will cause More files are read during data query, which consumes more I/O.
TTL design
Choosing an appropriate data expiration time is also a point to pay attention to in table design. HBase allows column clusters to define data expiration times. Once the data exceeds the expiration time, it can be cleaned up by major compact. A large amount of useless historical data remains, which will increase the size of the region and affect the query efficiency.