Hbase API operation optimization

1. put optimization

The Hbase API is equipped with a client-side write buffer (write buffer). The buffer is responsible for collecting put operations, and then calling the PRC operation to send the put to the server at one time. The write buffer is disabled by default, you can call table.setAutoFlush(false) to activate the buffer:

	@Test
	public  void testWriteBuffer() throws Exception{
		HTable table = (HTable)conn.getTable(TableName.valueOf("t1"));
		//table.setAutoFlushTo(false);
		long start = System.currentTimeMillis();
		for(int i=10001; i<20000; i++){
			Put put = new Put(Bytes.toBytes("row"+i));
			put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("terry"+i));
			put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("job"), Bytes.toBytes("manager"+i));
			table.put(put);
		}
		//table.flushCommits();
		table.close();
		System.out.println(System.currentTimeMillis()-start);
	}

 Test Results: 

1 Use table.setAutoFlushTo(false);: Result 864, Note: Even without table.flushCommits(); the cache content will be committed when table.close(); is executed.

2 without using table.setAutoFlushTo(false);: result 25443

 

The default size of Write Buffer is 2MB. If you need to store larger data at a time, you can consider increasing this value

Method 1: Temporarily modify WriteBufferSize

table.setWriteBufferSize(writeBufferSize);

Method 2: Modify hbase-site.xml once

  <property>
    <name>hbase.client.write.buffer</name>
    <value>2097152</value>
  </property>

 

 In addition, using List can also optimize put, the following code test result 614:

	@Test
	public  void testPubList() throws Exception{
		HTable table = (HTable)conn.getTable(TableName.valueOf("t1"));
		List<Put> publist = new ArrayList<Put>();	
		long start = System.currentTimeMillis();
		for(int i=30001; i<40000; i++){
			Put put = new Put(Bytes.toBytes("row"+i));
			put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("terry"+i));
			put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("job"), Bytes.toBytes("manager"+i));
			publist.add(put);
		}
		table.put(publist);
		table.close();
		System.out.println(System.currentTimeMillis()-start);
	}

 

 

Two Scan optimization

Setting the scan buffer size can optimize scanner performance,

	@Test
	public  void testScanCache() throws Exception{
		HTable table = (HTable)conn.getTable(TableName.valueOf("t1"));
		Scan scan = new Scan(Bytes.toBytes("row0"), Bytes.toBytes("row999"));
		scan.setCaching(100);
		ResultScanner rs= table.getScanner(scan);
		Iterator<Result> it = rs.iterator();
		long start = System.currentTimeMillis();
		while(it.hasNext()){
			Result r = it.next();
			String name = Bytes.toString(r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("name")));
			String job = Bytes.toString(r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("job")));
			System.out.println(String.format("name=%s, job=%s", name, job) );
		}
		table.close();
		System.out.println(System.currentTimeMillis() - start);
	}

scan.setCaching(int value); value represents the number of rows fetched by one RPC. The default value is hbase.client.scanner.caching in hbase-site.xml, which is 2147483647. So scan.setCaching(100) is used in the above example; performance is reduced instead.

A high value of scanner.caching can also bring some disadvantages, such as RPC timeout or the data returned to the client exceeds its heap size.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326423271&siteId=291194637