Detailed HBase coprocessor

A brief

When using HBase, if the amount of data you reached millions to billions of rows or columns at this time could return a large amount of data in the query will be subject to the bandwidth of the network, even if network conditions permit, but also the client computing process It may not be able to meet the requirements. In this case, the coprocessor (coprocessors) came into being. It allows you to put in a business computing code RegionServer coprocessor, the processed data back to the client, which can greatly reduce the amount of data to be transmitted, thereby obtaining performance improvement. At the same time also it allows users to extend the coprocessor is currently not available to achieve HBase functions, such as checking authority, secondary index, integrity constraints.

Second, the type of coprocessor

2.1 Observer coprocessor

1. Function

Observer coprocessor is similar to a relational database trigger, when certain events occur such co-processor will be called the Server side. Generally it is used to achieve the following functions:

  • Permissions check : the execution Getor Putbefore the operation, you can use preGetor prePutmethod to check the permissions;
  • Integrity constraints : HBase does not support foreign key of a relational database, can insert or delete data, the associated data to be checked by the trigger;
  • Two indexes : the coprocessor to maintain secondary indexes.

2. Type

Current Observer coprocessor has the following four types:

  • RegionObserver :
    Allows you to observe events on the Region, such as Get and Put operations.
  • RegionServerObserver :
    Allows you to observe events associated with RegionServer operations, such as starting, stopping, or perform a merge, commit or rollback.
  • MasterObserver :
    allows you to observe the HBase Master-related events, such as table creation, delete, or modify the schema.
  • WalObserver :
    allows you to observe the events related to the write-ahead log (WAL) of.

3. Interface

Observer above four types of coprocessors are inherited from Coprocessorthe interface, these four connectors are defined all available hook method, in order to perform specific operations before and after the corresponding method. Normally, we do not directly implement the above interfaces, but inherit Base implementation class, simply empty Base implementation class implements interface methods, so when we realize custom coprocessor, do not have to implement all method, you only need to override the necessary methods.

hbase-coprocessor

In this RegionObserversexample, the interface classes are defined all available hook method, following the interception of a defined part of the method, most of the methods are appearing in pairs, there prehave post:

RegionObserver

4. Implementation Process

RegionObservers-works

  • Client makes a request put
  • The request is assigned to the appropriate region and RegionServer
  • coprocessorHost intercepts the request, and then call each RegionObserver prePut on the table ()
  • If not prePut()intercepted, the request to proceed to the Region, then treated
  • The results generated by the region was again intercepted CoprocessorHost, callpostPut()
  • If there is no postPut()interception of the response, the final result is returned to the client

If we understand the Spring, this implementation can assimilate to the implementation of the principle of its AOP, which is the official document like this analogy:

If you are familiar with Aspect Oriented Programming (AOP), you can think of a coprocessor as applying advice by intercepting a request and then running some custom code,before passing the request on to its final destination (or even changing the destination).

如果您熟悉面向切面编程(AOP),您可以将协处理器视为通过拦截请求然后运行一些自定义代码来使用Advice,然后将请求传递到其最终目标(或者更改目标)。

2.2 Endpoint协处理器

Endpoint协处理器类似于关系型数据库中的存储过程。客户端可以调用Endpoint协处理器在服务端对数据进行处理,然后再返回。

以聚集操作为例,如果没有协处理器,当用户需要找出一张表中的最大数据,即 max 聚合操作,就必须进行全表扫描,然后在客户端上遍历扫描结果,这必然会加重了客户端处理数据的压力。利用 Coprocessor,用户可以将求最大值的代码部署到 HBase Server 端,HBase将利用底层 cluster 的多个节点并发执行求最大值的操作。即在每个 Region 范围内执行求最大值的代码,将每个 Region 的最大值在 Region Server 端计算出来,仅仅将该 max 值返回给客户端。之后客户端只需要将每个 Region 的最大值进行比较而找到其中最大的值即可。

三、协处理的加载方式

要使用我们自己开发的协处理器,必须通过静态(使用HBase配置)或动态(使用HBase Shell或Java API)加载它。

  • 静态加载的协处理器称之为 System Coprocessor(系统级协处理器),作用范围是整个HBase上的所有表,需要重启HBase服务;
  • 动态加载的协处理器称之为 Table Coprocessor(表处理器),作用于指定的表,不需要重启HBase服务。

其加载和卸载方式分别介绍如下。

四、静态加载与卸载

4.1 静态加载

静态加载分以下三步:

  1. hbase-site.xml定义需要加载的协处理器。
<property>
    <name>hbase.coprocessor.region.classes</name>
    <value>org.myname.hbase.coprocessor.endpoint.SumEndPoint</value>
</property>

<name>标签的值必须是下面其中之一:

  • RegionObservers 和 Endpoints协处理器:hbase.coprocessor.region.classes
  • WALObservers协处理器: hbase.coprocessor.wal.classes
  • MasterObservers协处理器:hbase.coprocessor.master.classes

    <value>必须是协处理器实现类的全限定类名。如果为加载指定了多个类,则类名必须以逗号分隔。

  1. 将jar(包含代码和所有依赖项)放入HBase安装目录中的lib目录下;
  2. 重启HBase。

4.2 静态卸载

  1. 从hbase-site.xml中删除配置的协处理器的<property>元素及其子元素;
  2. 从类路径或HBase的lib目录中删除协处理器的JAR文件(可选);
  3. 重启HBase。

五、动态加载与卸载

使用动态加载协处理器,不需要重新启动HBase。但动态加载的协处理器是基于每个表加载的,只能用于所指定的表。
此外,在使用动态加载必须使表脱机(disable)以加载协处理器。动态加载通常有两种方式:Shell 和 Java API 。

以下示例基于两个前提:

  1. coprocessor.jar 包含协处理器实现及其所有依赖项。
  2. JAR 包存放在HDFS上的路径为:hdfs:// <namenode>:<port> / user / <hadoop-user> /coprocessor.jar

5.1 HBase Shell动态加载

  1. 使用HBase Shell禁用表
hbase > disable 'tableName'
  1. 使用如下命令加载协处理器
hbase > alter 'tableName', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/
user/<hadoop-user>/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|
arg1=1,arg2=2'

Coprocessor包含由管道(|)字符分隔的四个参数,按顺序解释如下:

  • JAR包路径:通常为JAR包在HDFS上的路径。关于路径以下两点需要注意:
  • 允许使用通配符,例如:hdfs://<namenode>:<port>/user/<hadoop-user>/*.jar 来添加指定的JAR包;
  • 可以使指定目录,例如:hdfs://<namenode>:<port>/user/<hadoop-user>/ ,这会添加目录中的所有JAR包,但不会搜索子目录中的JAR包。
  • 类名:协处理器的完整类名。
  • 优先级:协处理器的优先级,遵循数字的自然序,即值越小优先级越高。可以为空,在这种情况下,将分配默认优先级值。
  • 可选参数 :传递的协处理器的可选参数。
  1. 启用表
hbase > enable 'tableName'
  1. 验证协处理器是否已加载
hbase > describe 'tableName'

协处理器出现在TABLE_ATTRIBUTES属性中则代表加载成功。

</br>

5.2 HBase Shell动态卸载

  1. 禁用表
hbase> disable 'tableName'
  1. 移除表协处理器
hbase> alter 'tableName', METHOD => 'table_att_unset', NAME => 'coprocessor$1'
  1. 启用表
hbase> enable 'tableName'

</br>

5.3 Java API 动态加载

TableName tableName = TableName.valueOf("users");
String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
admin.disableTable(tableName);
HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
columnFamily1.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily1);
HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
columnFamily2.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily2);
hTableDescriptor.setValue("COPROCESSOR$1", path + "|"
+ RegionObserverExample.class.getCanonicalName() + "|"
+ Coprocessor.PRIORITY_USER);
admin.modifyTable(tableName, hTableDescriptor);
admin.enableTable(tableName);

在HBase 0.96及其以后版本中,HTableDescriptor的addCoprocessor()方法提供了一种更为简便的加载方法。

TableName tableName = TableName.valueOf("users");
Path path = new Path("hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar");
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
admin.disableTable(tableName);
HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
columnFamily1.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily1);
HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
columnFamily2.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily2);
hTableDescriptor.addCoprocessor(RegionObserverExample.class.getCanonicalName(), path,
Coprocessor.PRIORITY_USER, null);
admin.modifyTable(tableName, hTableDescriptor);
admin.enableTable(tableName);

5.4 Java API 动态卸载

卸载其实就是重新定义表但不设置协处理器。这会删除所有表上的协处理器。

TableName tableName = TableName.valueOf("users");
String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
admin.disableTable(tableName);
HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
columnFamily1.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily1);
HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
columnFamily2.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily2);
admin.modifyTable(tableName, hTableDescriptor);
admin.enableTable(tableName);

六、协处理器案例

Here is a simple case, to achieve a similar Redis in appendcoprocessor command, and when we have time to put the implementation of the Action column, HBase default execution of the operation is to update, here we modified to perform the append operation.

# redis append 命令示例
redis>  EXISTS mykey
(integer) 0
redis>  APPEND mykey "Hello"
(integer) 5
redis>  APPEND mykey " World"
(integer) 11
redis>  GET mykey 
"Hello World"

6.1 Create a test table

# 创建一张杂志表 有文章和图片两个列族
hbase >  create 'magazine','article','picture'

6.2 Coprocessor Programming

The complete code can be seen in this warehouse: HBase-the Observer-Coprocessor

New Maven project, import the following dependence:

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-common</artifactId>
    <version>1.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.2.0</version>
</dependency>

Inherited BaseRegionObserverachieve our custom RegionObserver, to the same article:contentend of the command execution put, to add new content inserted into the original content of the code is as follows:

public class AppendRegionObserver extends BaseRegionObserver {

    private byte[] columnFamily = Bytes.toBytes("article");
    private byte[] qualifier = Bytes.toBytes("content");

    @Override
    public void prePut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit,
                       Durability durability) throws IOException {
        if (put.has(columnFamily, qualifier)) {
            // 遍历查询结果,获取指定列的原值
            Result rs = e.getEnvironment().getRegion().get(new Get(put.getRow()));
            String oldValue = "";
            for (Cell cell : rs.rawCells())
                if (CellUtil.matchingColumn(cell, columnFamily, qualifier)) {
                    oldValue = Bytes.toString(CellUtil.cloneValue(cell));
                }

            // 获取指定列新插入的值
            List<Cell> cells = put.get(columnFamily, qualifier);
            String newValue = "";
            for (Cell cell : cells) {
                if (CellUtil.matchingColumn(cell, columnFamily, qualifier)) {
                    newValue = Bytes.toString(CellUtil.cloneValue(cell));
                }
            }

            // Append 操作
            put.addColumn(columnFamily, qualifier, Bytes.toBytes(oldValue + newValue));
        }
    }
}

6.3 Packaging Project

Use maven command to package, package files namedhbase-observer-coprocessor-1.0-SNAPSHOT.jar

# mvn clean package

6.4 JAR package uploaded to HDFS

# 上传项目到HDFS上的hbase目录
hadoop fs -put /usr/app/hbase-observer-coprocessor-1.0-SNAPSHOT.jar /hbase
# 查看上传是否成功
hadoop fs -ls /hbase

hbase-cp-hdfs

6.5 Load coprocessor

  1. Before loading the coprocessor will need to disable table
hbase >  disable 'magazine'
  1. Load coprocessor
hbase >   alter 'magazine', METHOD => 'table_att', 'Coprocessor'=>'hdfs://hadoop001:8020/hbase/hbase-observer-coprocessor-1.0-SNAPSHOT.jar|com.heibaiying.AppendRegionObserver|1001|'
  1. Enabling table
hbase >  enable 'magazine'
  1. View coprocessor is loaded successfully
hbase >  desc 'magazine'

Coprocessors in the TABLE_ATTRIBUTESattribute represents a successful load, as shown below:

hbase-cp-load

6.6 Load test results

Inserts a set of test data:

hbase > put 'magazine', 'rowkey1','article:content','Hello'
hbase > get 'magazine','rowkey1','article:content'
hbase > put 'magazine', 'rowkey1','article:content','World'
hbase > get 'magazine','rowkey1','article:content'

You can see the value for the specified column append operation has been performed:

hbase-cp-helloworld

Inserting a group of control data:

hbase > put 'magazine', 'rowkey1','article:author','zhangsan'
hbase > get 'magazine','rowkey1','article:author'
hbase > put 'magazine', 'rowkey1','article:author','lisi'
hbase > get 'magazine','rowkey1','article:author'

For normal column can be seen or to perform update operations:

hbase-cp-lisi>

Uninstall 6.7 Coprocessor

  1. Before uninstalling coprocessor will need to disable table
hbase >  disable 'magazine'
  1. Uninstall coprocessor
hbase > alter 'magazine', METHOD => 'table_att_unset', NAME => 'coprocessor$1'
  1. Enabling table
hbase >  enable 'magazine'
  1. Check whether the coprocessor successfully uninstalled
hbase >  desc 'magazine'

hbase-co-unload

6.8 Uninstall test results

In turn execute the following command to uninstall the test is successful

hbase > get 'magazine','rowkey1','article:content'
hbase > put 'magazine', 'rowkey1','article:content','Hello'
hbase > get 'magazine','rowkey1','article:content'

hbase-co-unload

Reference material

  1. Apache HBase Coprocessors
  2. Apache HBase Coprocessor Introduction
  3. HBase higher-order knowledge

More big data series can be found in personal GitHub open source project: Big Data Getting Started

Guess you like

Origin blog.51cto.com/14183932/2412387