HBase database retrieval performance optimization strategy

Original forward since: https://www.ibm.com/developerworks/cn/java/j-lo-HBase/

HBase table describes the data

HBase distributed database is based, column-oriented, open source database mainly for use unstructured data store. Its design ideas from Google's non-open source database "BigTable".

HDFS provides the underlying storage support for HBase, MapReduce to provide computing power, ZooKeeper to provide coordinated services and failover (failover backup operation) mechanism. Pig and HBase Hive provides a high-level language support, it can statistical data (enabling multi-table join, etc.), Sqoop is to provide RDBMS data import capabilities.

HBase can not support where conditions, Order by the query, only the primary key Rowkey support in accordance with the range and the primary key to query, but that may be provided by HBase API conditional filtering.

HBase Rowkey is unique identification data of the line, the line data must be accessed through it, there are three ways, single key access, the row access key ranges, a full table scan access. Sort by storing data in row of keys sequentially bitwise comparison, the larger value is arranged in such a manner sorted int: 1,10,100,11,12,2,20 ..., 906, ....

ColumnFamily is "column family," belongs to the table schema is defined in the construction of the table, each column belongs to a family row, column by column family name as a prefix "ColumnFamily: qualifier", access control, disk and memory usage statistics are in column family level carried out.

Cell memory cell is determined by a row and column, byte code value is stored, no type.

Timestamp is an index to distinguish between different versions of Cell, 64-bit integer. Different versions of the data in reverse order according to the time stamp, the latest version of the data in the front row.

Hbase level in the row direction into N Region, only one of each table Region started, the amount of data increases, Region automatically split into two different distributions in different Region Server, but the same does not split into different Server.

Region by ColumnFamily into Store, Store minimum storage means for storing data for one column group, each comprising a memory Store memstore HFile and persisted to the disk.

HBase FIG. 1 is an example of the data table, the data distribution node multiple machines above.

Example 1. HBase data representing FIG.

Example 1. HBase data representing FIG.

HBase API call example

Operation is similar to relational database JDBC database, HBase client package itself provides a number of API can be used for operations, helping users to quickly operate HBase database. Provided such as to create a data table, delete data table, add a field, into the data, read data, etc. interface. 1 provides a list of tools of a package, including an operating table data, read data, stored data, export data or the like.

Listing 1.HBase API operation class code tools

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class HBaseUtil {
private Configuration conf = null;
private HBaseAdmin admin = null;

protected HBaseUtil(Configuration conf) throws IOException {
 this.conf = conf;
 this.admin = new HBaseAdmin(conf);
}

public boolean existsTable(String table)
 throws IOException {
 return admin.tableExists(table);
}

public void createTable(String table, byte[][] splitKeys, String... colfams)
 throws IOException {
HTableDescriptor desc = new HTableDescriptor(table);
for (String cf : colfams) {
HColumnDescriptor coldef = new HColumnDescriptor(cf);
desc.addFamily (coldef);
 }
if (splitKeys != null) {
admin.createTable(desc, splitKeys);
} else {
admin.createTable(desc);
 }
}

public void disableTable(String table) throws IOException {
admin.disableTable(table);
}

public void dropTable(String table) throws IOException {
 if (existsTable(table)) {
 disableTable(table);
 admin.deleteTable(table);
 }
}
 
public void fillTable(String table, int startRow, int endRow, int numCols,
 int pad, boolean setTimestamp, boolean random,
 String... colfams) throws IOException {
 HTable tbl = new HTable(conf, table);
 for (int row = startRow; row <= endRow; row++) {
 for (int col = 0; col < numCols; col++) {
 Put put = new Put(Bytes.toBytes("row-"));
 for (String cf : colfams) {
 String colName = "col-";
 String val = "val-";
 if (setTimestamp) {
 put.add(Bytes.toBytes(cf), Bytes.toBytes(colName),
 col, Bytes.toBytes(val));
 } else {
 put.add(Bytes.toBytes(cf), Bytes.toBytes(colName),
 Bytes.toBytes (selection));
 }
 }
 tbl.put(put);
 }
 }
 tbl.close();
 }

public void put(String table, String row, String fam, String qual,
 String val) throws IOException {
 HTable tbl = new HTable(conf, table);
 Put put = new Put(Bytes.toBytes(row));
 put.add(Bytes.toBytes(fam), Bytes.toBytes(qual), Bytes.toBytes(val));
 tbl.put(put);
 tbl.close();
 }

 public void put(String table, String row, String fam, String qual, long ts,
 String val) throws IOException {
 HTable tbl = new HTable(conf, table);
 Put put = new Put(Bytes.toBytes(row));
 put.add(Bytes.toBytes(fam), Bytes.toBytes(qual), ts, Bytes.toBytes(val));
 tbl.put(put);
 tbl.close();
 }

 public void put(String table, String[] rows, String[] fams, String[] quals,
 long[] ts, String[] vals) throws IOException {
 HTable tbl = new HTable(conf, table);
 for (String row : rows) {
 Put put = new Put(Bytes.toBytes(row));
 for (String fam : fams) {
 int v = 0;
 for (String reason: which) {
 String val = vals[v < vals.length ? v : vals.length];
 long t = ts[v < ts.length ? v : ts.length - 1];
 put.add(Bytes.toBytes(fam), Bytes.toBytes(qual), t,
 Bytes.toBytes (selection));
 v ++;
 }
 }
 tbl.put(put);
 }
 tbl.close();
 }

 public void dump(String table, String[] rows, String[] fams, String[] quals)
 throws IOException {
 HTable tbl = new HTable(conf, table);
 List<Get> gets = new ArrayList<Get>();
 for (String row : rows) {
 Get get = new Get(Bytes.toBytes(row));
 get.setMaxVersions();
 if (fams != null) {
 for (String fam : fams) {
 for (String reason: which) {
 get.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual));
 }
 }
 }
 gets.add(get);
 }
 Result[] results = tbl.get(gets);
 for (Result result : results) {
 for (KeyValue kv : result.raw()) {
 System.out.println("KV: " + kv +
 ", Value: " + Bytes.toString(kv.getValue()));
 }
 }
 }
 
 private static void scan(int caching, int batch) throws IOException {
 HTable table = null;
 final int[] counters = {0, 0};

 Scan scan = new Scan();
 scan.setCaching(caching); // co ScanCacheBatchExample-1-Set Set caching and batch parameters.
 scan.setBatch(batch);
 ResultScanner scanner = table.getScanner(scan);
 for (Result result : scanner) {
 counters[1]++; // co ScanCacheBatchExample-2-Count Count the number of Results available.
 }
 scanner.close();
 System.out.println("Caching: " + caching + ", Batch: " + batch +
 ", Results: " + counters[1] + ", RPCs: " + counters[0]);
 }
}

API operating table has HBaseAdmin provide, in particular, explain the operation of the deployment Scan.

HBase table data into a plurality of levels, HRegion-> HStore -> [HFile, HFile, ..., MemStore].

In HBase, the table may have a plurality Column Family, Scan in a process, each Column Family (Store) responsible for a data reading StoreScanner object. Store data from the HFile each file on a memory and disks MemStore, a corresponding object using StoreScanner MemStoreScanner and N StoreFileScanner to the actual data read.

Thus, the read data line requires the following steps:

1. Store the order of each read

2. For each Store, combined Store below HFile and associated memory MemStore

These two steps are done by the heap. RegionScanner stack by a plurality of read completion of the following composition StoreScanner using RegionScanner member variable KeyValueHeap storeHeap FIG. A StoreScanner a stack, the stack comprising a bottom element is StoreFileScanner and MemStoreScanner HFile and corresponding MemStore. The advantage is built heap pile high efficiency can be dynamically allocated memory size, survival do not have pre-determined period.

Then call seekScanners () for these StoreFileScanner and MemStoreScanner respectively seek. KeyValue against the seek, the seek is to seek to a specified semantic KeyValue, if KeyValue specified does not exist, to seek to the next specified KeyValue.

Scan type described common methods :

scan.addFamily () / scan.addColumn (): Specifies the Family or Column needs, if not call any addFamily or Column, returns all of Columns;

scan.setMaxVersions (): Specifies the maximum number of versions. Without any arguments setMaxVersions, it represents taking all editions. If you can not afford to use setMaxVersions, only to get the latest version.;

scan.setTimeRange (): Specifies the maximum and minimum timestamp time stamp, and only in this range can be acquired Cell;

scan.setTimeStamp (): Specifies the time stamp;

scan.setFilter (): Specifies Filter to filter out unwanted information;

scan.setStartRow (): Specifies the beginning of the line. If you do not call, from the beginning of the header;

scan.setStopRow (): specifies the end of the line (this line is free);

. Scan setCaching (): time from the server to read the number of rows (the RPC impact);

scan.setBatch (): returns the specified number of the most Cell. Used to prevent excessive line data, resulting in OutofMemory error, the default is unlimited.

HBase data sheet optimization

HBase is a high-reliability, high performance, column-oriented, distributed database scalable, but when the amount is too high or concurrently existing large volume of data, read and write performance decreases. We can use the following ways to gradually improve the retrieval speed of HBase.

Pre-partition

By default, automatically created when you create a Region of HBase table partition, when importing data, all of the HBase client to write data to the one Region, this Region until large enough and we have to be segmented. A way to speed up the bulk write speed is through pre-created some of the empty Regions, so that when data is written to HBase, will be in accordance Region partitioning, load balancing data in the cluster.

R owkey optimization

HBase is based in Rowkey lexicographical storage, therefore, when designing Rowkey, to take full advantage sorting features, data storage will often read together into one, the most recent data may be accessed on a piece.

In addition, if Rowkey incremental generation, it is recommended not to use the positive sequence Rowkey written directly, instead of using the reverse way of reversing Rowkey, making Rowkey roughly evenly distributed, this design has the advantage that can RegionServer load balancing, or prone to All new data accumulation phenomenon on a RegionServer, it can also be combined with pre-cut division table design together.

Reducing C amount olumnFamily

Do not too much in a table definition ColumnFamily. Hbase currently not well handle more than 2 to 3 ColumnFamily table. Because a ColumnFamily when flush, and also because of its proximity to the ColumnFamily correlation effect is triggered flush, eventually causing the system to produce more I / O.

Caching policy (setCaching)

When you create a table, you can HColumnDescriptor.setInMemory (true) RegionServer the table into the cache, guaranteed to be a hit in the cache when read.

Set storage lifetime

Create a table, it can, expired data will be automatically deleted by HColumnDescriptor.setTimeToLive (int timeToLive) Set in a data table storage lifetime.

Hard drive configuration

Each RegionServer management from 10 to 1000 Regions, each Region at 1 ~ 2G, then each Server least to 10G, the largest to 1000 * 2G = 2TB, consider the 3 backup, will have to 6TB. One embodiment with three 2TB hard drive, two hard disks with 12 500G, enough bandwidth, which offer greater throughput, more granular redundancy, a more rapid recovery of a single disk failure.

To allocate the appropriate memory RegionServer Service

Without affecting other services, the bigger the better. For example hbase-env.sh last added in HBase conf directory of export HBASE_REGIONSERVER_OPTS = "- Xmx16000m $ HBASE_REGIONSERVER_OPTS"

16000m wherein the memory size assigned to the RegionServer.

Write the number of backup data

Backup is proportional to the number of read performance, and write performance is inversely proportional to the number of backups and the impact of high availability. There are two configurations, one is copied to the hdfs-site.xml hbase conf directory, and then add or modify configuration items dfs.replication value to set the number of backup, all such modifications on HBase user tables are to take effect, another way is to rewrite the code HBase, let HBase support for a set number of backups column family, when you create a table, set the number of column families backup, the default is 3, this column family to back up only settings effect.

WAL (write-ahead log)

Switch can be set to indicate HBase before writing data without first write the log, it is on by default, turn off will improve performance, but if the system fails (responsible for the insertion of RegionServer hang), data may be lost. WAL configuration when calling Java API is written, set WAL Put instance, calls Put.setWriteToWAL (boolean).

Batch writing

Put the HBase support a single drop, and also supports bulk insert, in general, bulk write faster, saving network back and forth overhead. When the client calls the Java API, the first batch Put Put into a list, then call HTable of Put (Put List) function to write batch.

Number of clients a server from the pulled

A large amount of data can be configured once got me to reduce the client's data acquisition time, but it will take the client memory. There are three places you can configure:

1) disposed in the HBase conf hbase.client.scanner.caching configuration file;

2) is configured by calling HTable.setScannerCaching (int scannerCaching);

3) configured by calling Scan.setCaching (int caching). Three increasingly high priority.

RegionServer the number of IO request processing threads

IO thread less suitable for processing a single request higher memory consumption Big Put scenes (large capacity single Put or set up a large cache of Scan, belong to the Big Put) or ReigonServer memory more intense scenes.

More IO threads for a single request Low memory consumption, TPS requirements (the amount of processing transactions per second (TransactionPerSecond)) very high scenario. When the value is set to monitor the memory as the main reference.

Configuring the hbase-site.xml configuration file entries for hbase.regionserver.handler.count.

Region size settings

Configuration items to hbase.hregion.max.filesize, in profile hbase-site.xml., The default size of 256M.

Single Reigon on the current ReigonServer maximum storage space, when a single Region exceeds this value, the Region will be automatically split into smaller Region. Small Region of split and compaction-friendly, since the split Region or Region where small compact StoreFile fast, low memory footprint. The disadvantage is split and compaction will be very frequent, especially the large number of small Region kept split, compaction, will lead the cluster response time fluctuates greatly, not only the number Region too much trouble for the management, and even cause some Hbase the bug. The following are considered general 512M small Region. Large Region is less suitable for regular split and compaction, because it will produce a compact and split pause a long time, the impact of application performance on reading and writing very large.

In addition, a large Region means greater StoreFile, when compaction of memory is also a challenge. If your application scenario, the lower the amount of access a point in time, so at this point to make compact and split, both the successful completion of split and compaction, but also to ensure smooth most of the time read and write performance. compaction is unavoidable, split from automatic to manual adjustment. As long as the value of this parameter by transfer to a large value is difficult to achieve, such as 100G, you can disable the automatic indirect split (RegionServer 100G will not reach the Region do not split). Coupled with RegionSplitter this tool, when needed split, manually split. Manual split in terms of flexibility and stability is much higher than the automatic split and administrative costs increased much more recommended online real-time system. Memory, small Region on the size of the value set memstore more flexible, large Region is too large too small will not work, app when leading to flush the IO wait increase over the General Assembly, is too small due to the excessive influence of StoreFile read performance.

HBase configuration

HBase server memory is recommended at least 32G, Table 1 shows the amount of memory recommended value of each character through the test of practice to get the assignment.

Table 1. HBase configuration information related services

Module Service Category Memory Requirements
HDFS HDFS NameNode 16GB
HDFS DataNode 2GB
HBase HMaster 2GB
HRegionServer 16GB
ZooKeeper ZooKeeper 4GB

Region HBase single size larger set of recommendations, recommended 2G, RegionServer handle a small number of large Region faster than a lot of small Region. For important data, when creating a table in which a separate column group, and setting its column number of backup group 2 (default This will ensure double back, and can save space and improve write performance, the cost of high availability is somewhat less than the number of backups 3, and the read time performance than the default number of backups.

actual case

Item Requirements HBase can delete data stored in the data table, the data in HBase Rowkey (data generated by the task) plus 16 by the task ID of random numbers, task information maintained by a separate table. Figure 2 is a flowchart of data deletion.

2. FIG flowchart data deletion

2. FIG flowchart data deletion

It was originally designed to remove the task according to the task ID stored in the task of simultaneously deleting the corresponding data in the HBase. But that could cause more HBase data deletion takes a long time, and because of disk high I / O, can cause data read, write timeout.

When you view the log found HBase delete data, HBase operations do Major Compaction. Major Compaction purpose of the operation is to combine files and remove deleted, expired, redundant version of the data. When Major Compaction HBase will merge Region in StoreFile, if sustained for a long time this action will cause the entire Region is not readable, eventually leading to all of these query timeout based on the Region.

If you want to solve the problem Major Compaction need to view its source code. HBase found by viewing the source code RegionServer start time, there CompactionChecker threads need to do regular testing Compact. The source code shown in Figure 3.

FIG 3. CompactionChecker FIG threaded code

FIG 3. CompactionChecker FIG threaded code

isMajorCompaction 中会根据 hbase.hregion.majorcompaction 参数来判断是否做 Major Compact。如果 hbase.hregion.majorcompaction 为 0,则返回 false。修改配置文件 hbase.hregion.majorcompaction 为 0,禁止 HBase 的定期 Major Compaction 机制,通过自定义的定时机制 (在凌晨 HBase 业务不繁忙时) 执行 Major 操作,这个定时可以是通过 Linux cron 定时启动脚本,也可以通过 Java 的 timer schedule,在实际项目中使用 Quartz 来启动,启动的时间配置在配置文件中给出,可以方便的修改 Major Compact 启动的时间。通过这种修改后,我们发现在删除数据后仍会有 Compact 操作。这样流程进入 needsCompaction = true 的分支。查看 needsCompaction 判断条件为 (storefiles.size() - filesCompacting.size()) > minFilesToCompact 触发。同时当需紧缩的文件数等于 Store 的所有文件数,Minor Compact 自动升级为 Major Compact。但是 Compact 操作不能禁止,因为这样会导致数据一直存在,最终影响查询效率。

基于以上分析,我们必须重新考虑删除数据的流程。对用户来说,用户只要在检索时对于删除的任务不进行检索即可。那么只需要删除该条任务记录,对于该任务相关联的数据不需要立马进行删除。当系统空闲时候再去定时删除 HBase 数据表中的数据,并对 Region 做 Major Compact,清理已经删除的数据。通过对任务删除流程的修改,达到项目的需求,同时这种修改也不需要修改 HBase 的配置。

图 4. 数据删除流程对比图

Figure 4. Comparison data deletion process in FIG.

Retrieval, query, delete data HBase data in the table itself, there are a lot of relevance, need to view the source code to HBase data tables to determine the root cause of performance bottlenecks and retrieve the final solution.

Conclusion

Use and retrieval optimizations HBase databases exist in traditional relational databases are more different, Starting from the basic definition of how the data table, start by itself HBase API access methods provided by way of illustration optimization and precautions Finally, an example to verify the feasibility of optimization. Retrieval performance itself is the data table design, process design, logic design combined product, programmers need to make the right optimization-depth understanding.

related topic

  • Reference  developerWorks China HBase knowledge on  the search page to view the article on IBM's Developer Forum HBase published.
  • View article " On HBase ", author base for HBase data sheets were interpreted.
  • View books "HBase Definition", the author is the founder of HBase, HBase database of authoritative answers.
  • View blog " HBase -talk ," the authors have more practical experience.
  • developerWorks Java technology zone : Hundreds of articles about every aspect of Java programming.
Published 130 original articles · won praise 39 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_34901049/article/details/103677269