Hbase the hive integration with comparison

 Comparison of HBase and Hive

1.Hive

(1) Data Warehousing

Hive of nature in fact equivalent to the files stored in HDFS already had a two-shot relationship Mysql in order to facilitate the use HQL to manage queries.

(2) for data analysis, cleaning

Hive suitable for off-line data analysis and cleaning, a high latency.

(3) based on the HDFS, MapReduce

Hive data is still stored on DataNode, written in HQL statement will eventually be converted into MapReduce code execution.

2.HBase

(1) Database

It is a column for storing non-relational database.

(2) a data storage structure and unstructured

Storage for single-table non-relational data, not suitable for related queries, and other similar JOIN operations.

(3) Based on HDFS

Embodied in the form of persistent storage of data is hFile, stored in DataNode in ResionServer be managed in the region.

(4) low delay, access online services using

The face of a large number of enterprise data, HBase can be linear single table to store large amounts of data, while providing efficient data access speed.

6.4.2 HBase and Hive integrated use

Screaming Tip: HBase and Hive integration can not be compatible in the latest two versions. So, we can only tears brave recompile: hive-hbase-handler-1.2.2.jar! ! Good air! !

Preparing the Environment

Because we follow may also affect HBase Hive while operating, it is required to hold the operation of HBase Hive Jar, then the next copy Hive Jar package depends (or in the form of soft-wired).

export HBASE_HOME=/opt/module/hbase

export HIVE_HOME=/opt/module/hive

 

ln -s $HBASE_HOME/lib/hbase-common-1.3.1.jar  $HIVE_HOME/lib/hbase-common-1.3.1.jar

ln -s $HBASE_HOME/lib/hbase-server-1.3.1.jar $HIVE_HOME/lib/hbase-server-1.3.1.jar

ln -s $HBASE_HOME/lib/hbase-client-1.3.1.jar $HIVE_HOME/lib/hbase-client-1.3.1.jar

ln -s $HBASE_HOME/lib/hbase-protocol-1.3.1.jar $HIVE_HOME/lib/hbase-protocol-1.3.1.jar

ln -s $HBASE_HOME/lib/hbase-it-1.3.1.jar $HIVE_HOME/lib/hbase-it-1.3.1.jar

ln -s $HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar $HIVE_HOME/lib/htrace-core-3.1.0-incubating.jar

ln -s $HBASE_HOME/lib/hbase-hadoop2-compat-1.3.1.jar $HIVE_HOME/lib/hbase-hadoop2-compat-1.3.1.jar

ln -s $HBASE_HOME/lib/hbase-hadoop-compat-1.3.1.jar $HIVE_HOME/lib/hbase-hadoop-compat-1.3.1.jar

Zookeeper also modify the properties of the hive-site.xml as follows:

<property>

  <name>hive.zookeeper.quorum</name>

  <value>hadoop102,hadoop103,hadoop104</value>

  <description>The list of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>

</property>

<property>

  <name>hive.zookeeper.client.port</name>

  <value>2181</value>

  <description>The port of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>

</property>

1.案例一

目标:建立Hive表,关联HBase表,插入数据到Hive表的同时能够影响HBase表。

分步实现:

(1) 在Hive中创建表同时关联HBase

CREATE TABLE hive_hbase_emp_table(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal double,

comm double,

deptno int)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")

TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

提示:完成之后,可以分别进入Hive和HBase查看,都生成了对应的表

(2) 在Hive中创建临时中间表,用于load文件中的数据

提示:不能将数据直接load进Hive所关联HBase的那张表中

CREATE TABLE emp(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal double,

comm double,

deptno int)

row format delimited fields terminated by '\t';

(3) 向Hive中间表中load数据

hive> load data local inpath '/home/admin/softwares/data/emp.txt' into table emp;

(4) 通过insert命令将中间表中的数据导入到Hive关联HBase的那张表中

hive> insert into table hive_hbase_emp_table select * from emp;

(5) 查看Hive以及关联的HBase表中是否已经成功的同步插入了数据

Hive:

hive> select * from hive_hbase_emp_table;

HBase:

hbase> scan ‘hbase_emp_table’

2.案例二

目标:在HBase中已经存储了某一张表hbase_emp_table,然后在Hive中创建一个外部表来关联HBase中的hbase_emp_table这张表,使之可以借助Hive来分析HBase这张表中的数据。

注:该案例2紧跟案例1的脚步,所以完成此案例前,请先完成案例1。

分步实现:

(1) 在Hive中创建外部表

CREATE EXTERNAL TABLE relevance_hbase_emp(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal double,

comm double,

deptno int)

STORED BY

'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" =

":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")

TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

(2) 关联后就可以使用Hive函数进行一些分析操作了

hive (default)> select * from relevance_hbase_emp;

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11959954.html