rowkey design principles and methods

rowkey design principles and methods

First rowkey design should follow three principles:

rowkey length principle

rowkey is a binary stream, any character string, the maximum length of 64KB, practical applications generally 10-100bytes, it is saved in the byte [] form, generally set to a fixed length.

Generally, the shorter the better, not more than 16 bytes, note the following reasons:

1, there are 64-bit operating system, memory 8-byte aligned, control byte 16, an integral multiple of 8 bytes by using the best features of the operating system.
2, hbase partial data is loaded into memory, if rowkey too long, the effective utilization of the memory decreases.

rowkey hash principles

If the timestamp increment to rowkey manner, not the time in front of the binary code, the recommendations rowkey high byte Hash field process, then generated by the program. Longwall time field, which will increase the data evenly distributed, the probability of each regionServer load balancing.

If you do not hashed, the first field directly using time information, all the data for that time period will be focused to a regionServer them, so that when retrieving data, the load will focus on individual regionServer, causing hot issues, will reduce the query efficiency.

rowkey the only principle

Must ensure its uniqueness in design, rowkey is lexicographically sort of storage, therefore, designed rowkey time, to take full advantage of the characteristics of this sort, the data often stored in a read, the recently may be accessed data into one. But this amount can not be too large, if too need to be split up into multiple nodes.

So good rowkey design should follow three principles, and make data dispersion, thereby avoiding hot spots.
This section describes several common design approach rowkey, for students to learn.

Published 23 original articles · won praise 0 · Views 1438

Guess you like

Origin blog.csdn.net/weixin_45775343/article/details/102840014