1. put optimization
The Hbase API is equipped with a client-side write buffer (write buffer). The buffer is responsible for collecting put operations, and then calling the PRC operation to send the put to the server at one time. The write buffer is disabled by default, you can call table.setAutoFlush(false) to activate the buffer:
@Test public void testWriteBuffer() throws Exception{ HTable table = (HTable)conn.getTable(TableName.valueOf("t1")); //table.setAutoFlushTo(false); long start = System.currentTimeMillis(); for(int i=10001; i<20000; i++){ Put put = new Put(Bytes.toBytes("row"+i)); put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("terry"+i)); put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("job"), Bytes.toBytes("manager"+i)); table.put(put); } //table.flushCommits(); table.close(); System.out.println(System.currentTimeMillis()-start); }
Test Results:
1 Use table.setAutoFlushTo(false);: Result 864, Note: Even without table.flushCommits(); the cache content will be committed when table.close(); is executed.
2 without using table.setAutoFlushTo(false);: result 25443
The default size of Write Buffer is 2MB. If you need to store larger data at a time, you can consider increasing this value
Method 1: Temporarily modify WriteBufferSize
table.setWriteBufferSize(writeBufferSize);
Method 2: Modify hbase-site.xml once
<property> <name>hbase.client.write.buffer</name> <value>2097152</value> </property>
In addition, using List can also optimize put, the following code test result 614:
@Test public void testPubList() throws Exception{ HTable table = (HTable)conn.getTable(TableName.valueOf("t1")); List<Put> publist = new ArrayList<Put>(); long start = System.currentTimeMillis(); for(int i=30001; i<40000; i++){ Put put = new Put(Bytes.toBytes("row"+i)); put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("name"), Bytes.toBytes("terry"+i)); put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("job"), Bytes.toBytes("manager"+i)); publist.add(put); } table.put(publist); table.close(); System.out.println(System.currentTimeMillis()-start); }
Two Scan optimization
Setting the scan buffer size can optimize scanner performance,
@Test public void testScanCache() throws Exception{ HTable table = (HTable)conn.getTable(TableName.valueOf("t1")); Scan scan = new Scan(Bytes.toBytes("row0"), Bytes.toBytes("row999")); scan.setCaching(100); ResultScanner rs= table.getScanner(scan); Iterator<Result> it = rs.iterator(); long start = System.currentTimeMillis(); while(it.hasNext()){ Result r = it.next(); String name = Bytes.toString(r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("name"))); String job = Bytes.toString(r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("job"))); System.out.println(String.format("name=%s, job=%s", name, job) ); } table.close(); System.out.println(System.currentTimeMillis() - start); }
scan.setCaching(int value); value represents the number of rows fetched by one RPC. The default value is hbase.client.scanner.caching in hbase-site.xml, which is 2147483647. So scan.setCaching(100) is used in the above example; performance is reduced instead.
A high value of scanner.caching can also bring some disadvantages, such as RPC timeout or the data returned to the client exceeds its heap size.