Hbase虽然能提供海量数据的实时读写，但是一旦数据量非常大，查询延迟也会非常高，所以要做好优化工作。

一、表结构设计

1、列族越少越好

（1）列族（cf）数量，在内存结构中一个cf对应一个store区域，一个store中又存在多个storefile小文件，小storefile是不断合并新的大的storefile，数据量大了，storefile自然会多，合并任务也自然增多，会降低性能，增加列族性能会更有甚之。

（2）由于数据的备份、迁移、合并等操作都是基于列族层面进行的。列族少，也会减少数据备份、迁移、合并过程中内存、磁盘IO的耗时。

（3）每个列族读写次数不一样，假设A列族数据量很大，B列族数据量较小，当A列族，memstore数据达到阀值时候，就会flush到磁盘中，这时会带动B列族的memstore刷新到内存中，导致B列族频繁flush，增加不必要的磁盘IO操作。

2、参数设置

把IN_MEMORY 设置为true，开启内存缓存，默认为false

二、rowKey的设计

rowKey即行健，相当于一级索引，根据业务需要，设计合理可以大大提高hbase查询效率

1、均匀分布

rowKey尽量要短，写入要分散，分布均匀，根据业务合理设置预分区，避免热点写

2、存储设计

查询用户所有订单，用户查询每个月的订单按时间由近及远排序，及时间倒序

这是我们就要利用hbase rowkey的字典排序特性来合理设计rowkey，以提高查询速度。由于每个查询都是基于某个用户的时间倒序。所以这里rowkey的设计用户id+时间戳timestamp来作为rowkey，这样一个用户的订单信息就会连续存储在一起，查询效率自然提高。如id=10000001，timestamp=1536425757188，rowkey=10000001_1536425757188。这样rowkey是使一个用户的订单连续分布一起了，但是时间正序，不符合倒序要求。因此要进一步优化，这时我们只需timestamp倒序就行了，用一个大数减去timestamp，timestamp=9223370500429018619=long.max-1536425757188 ，rowkey=10000001_9223370500429018619 如这样时间同一个用户最近的订单就排在最前面了。
存储主要代码如下：

public static void addOneRecord()throws Exception{
    	Table table = null;
       
        table = connection.getTable(TableName.valueOf(tableName));
        SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
        //rowKey id+时间倒序
        String rowKey= "100001"+(Long.MAX_VALUE - sdf.parse(new Date()).getTime());

		Put put = new Put(Bytes.toBytes(rowKey));
		put.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier),Bytes.toBytes(value));
		table.put(put);
}

查询主要代码如下：

public void queryAll throws Exception {
		// 查询 100001  8月份 所有的订单
		Scan scan = new Scan();
		
		SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
		
		String startRow = "100001" + (Long.MAX_VALUE - sdf.parse("20180901000000").getTime());
		String stopRow = "100001_" + (Long.MAX_VALUE - sdf.parse("20180801000000").getTime());
		
		scan.setStartRow(startRow.getBytes());
		scan.setStopRow(stopRow.getBytes());
		
		ResultScanner rss = hTable.getScanner(scan);
		for(Result rs : rss) {
			System.out.print(new String(CellUtil.cloneValue(rs.getColumnLatestCell("cf".getBytes(), "dnum".getBytes()))));
			System.out.print(" " + new String(CellUtil.cloneValue(rs.getColumnLatestCell("cf".getBytes(), "date".getBytes()))));
			System.out.println(" " + new String(CellUtil.cloneValue(rs.getColumnLatestCell("cf".getBytes(), "type".getBytes()))));
		}
}

3、巧用过滤器

假设用户的订单有四个状态，1 待支付、2代发货、3待收货、 4已完成，用户要查询待发货的订单，这时要用到过滤器，减少查询量，提高查询速度

public void query() throws Exception {
		FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
		
		// 过滤器1、前缀过滤,用户过滤
                String id=100001;
		PrefixFilter filter1 = new PrefixFilter(id.getBytes());
		list.addFilter(filter1);
		
		String type=3;//待收货
		// 过滤器2、过滤type=3的用户
		SingleColumnValueFilter filter2 = new SingleColumnValueFilter("cf".getBytes(), "type".getBytes(), CompareOp.EQUAL, type.getBytes());
		list.addFilter(filter2);
		
		Scan scan = new Scan();
		scan.setFilter(list);
		
		ResultScanner rss = hTable.getScanner(scan);
		for(Result rs : rss) {
			System.out.print(new String(CellUtil.cloneValue(rs.getColumnLatestCell("cf".getBytes(), "num".getBytes()))));
			System.out.print(" " + new String(CellUtil.cloneValue(rs.getColumnLatestCell("cf".getBytes(), "date".getBytes()))));
			System.out.println(" " + new String(CellUtil.cloneValue(rs.getColumnLatestCell("cf".getBytes(), "type".getBytes()))));
		}
}

三、服务端调优

1、参数调优暂不了解待完成

2、JVM调优暂不了解待完成

Hbase查询性能优化

一、表结构设计

二、rowKey的设计

三、服务端调优

猜你喜欢