Points to note when designing HBase tables

Points to note when designing HBase tables

 

1. If it is possible, it is recommended to add a boomfilter. The boomfilter can perform a separate boomfilter for the column family and row key. When writing, it will write the hash of the row key to the boomfilter, which greatly reduces the time for retrieving data.

2.version can control the version of the data. If we don’t care about the old data, set it to 1, which can save about two-thirds of the space.

3. Snappy compression is recommended for compression. If it is cold data, it is recommended that the gzip compression rate be higher. Before snappy, the default compression of Google is lzo. After snappy, snappy is the default compression format.

4. ttl can add clearing time to the column family. If the minimum retention of version is 1 copy, then the latest 1 version will be retained. If it is 0, then after the ttl time is up, all the data under the column family will be Will be deleted.

5. Pre-partitioning By default, when creating the Hbase table, a region partition is automatically created. When importing data, all clients write data to this region until the entire region is large enough. One way to speed up batch writing is to create an empty region in advance, so that when data is written to Hbase, it will load balance the data in the cluster according to the region partition. In actual work, you generally need to do pre-partitioning in advance when creating a table. Generally speaking, two to five pre-partitions are set on each server. This can better reduce the Split process. When setting pre-partitions , The design of rowKey is particularly important

6. The design of column family try to have one and only one letter

7. Rowkey design principle, the length principle should not exceed 16 bytes. Because hbase is a keyvalue database, if the length is too long, it will affect the storage of hfile, and memstore will be stored in memory. If the rowkey is too large, it will lead to low memory utilization. The operating systems are all 64-bit systems. The memory is aligned with 8 bytes and controlled at 16 bytes. The integer multiple of 8 uses the best features of the operating system, the only principle of rowkey. , Rowkey is sorted and stored in lexicographical order, so when designing rowkey, we must make full use of the characteristics of this sorting and store the frequently read data together
 

Guess you like

Origin blog.csdn.net/hzp666/article/details/114974886