HBase Actual Combat | Hive Data Import Cloud HBase

Web environment

  1. Dedicated line: Users need to configure the VPC-related network information of the hbase cluster to the dedicated line, which can directly connect to the hbase environment

  2. Public cloud virtual machine VPC environment: choose to communicate with hbase VPC

  3. Others: need to open hbase public network

  4. Note: Import hbase data by default, and use the community package for the dependent hbase-common, hbase-client, hbase-server, and hbase-protocol. If it is a public network, you need to use the relevant package released by cloud hbase


Option 1: hive association hbase table

  1. Applicable scenarios: The amount of data is less than 4T (because you need to import data through hbase's api)

  2. Obtain the zk connection address from the hbase page, and start the hive client in the following way

hive  --hiveconf hbase.zookeeper.quorum=xxxx
  1. The case where the hbase table does not exist

  • Create the hive table hive_hbase_table to map the hbase table base_table, the hbase table hbase_table will be automatically created, and will be deleted as the hive table is deleted. Here you need to specify the mapping relationship from hive schema to hbase schema. For the type, please refer to Hive/HBaseIntegration

CREATE TABLE hive_hbase_table(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") 
TBLPROPERTIES ("hbase.table.name" = "hbase_table", 
"hbase.mapred.output.outputtable" = "hbase_table");
  • Create an original hive table and prepare some data

create table hive_data (mykey int,myval string);insert into hive_data values(1,"www.ymq.io");
  • Import the data in the original hive table hive_data into the hbase table hbase_table through the hive table hive_hbase_table

insert into table hive_hbase_table select * from hive_data;
  • Check whether there is data in the hbase table hbase_table

image

  1. The existence of hbase table

  • Create hive external table to associate hbase table, pay attention to the mapping relationship between hive schema and hbase schema. Deleting the external table will not delete the corresponding hbase table

CREATE EXTERNAL TABLE hive_hbase_external_table(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") 
TBLPROPERTIES ("hbase.table.name" = "hbase_table", 
"hbase.mapred.output.outputtable" = "hbase_table");
  • Other imported data is related to 2


Solution 2: Generate hfile from hive table and import to hbase through bulkload

  1. Applicable scenarios: large amount of data (above 4T)

  2. Convert hive data to hfile

  • Start hive and add the relevant hbase jar package

add jar /usr/lib/hive-current/lib/hive-hbase-handler-2.3.3.jar;add jar /usr/lib/hive-current/lib/hbase-common-1.1.1.jar;add jar /usr/lib/hive-current/lib/hbase-client-1.1.1.jar;add jar /usr/lib/hive-current/lib/hbase-protocol-1.1.1.jar;add jar /usr/lib/hive-current/lib/hbase-server-1.1.1.jar;
  • Create a hive table whose outputformat is HiveHFileOutputFormat

    其中/tmp/hbase_table_hfile/cf_0是hfile保存到hdfs的路径,cf_0是hbase family的名字

create table hbase_hfile_table(key int, cf_0_c0 string) 
stored asINPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat'TBLPROPERTIES ('hfile.family.path' = '/tmp/hbase_table_hfile/cf_0');
  • 把原始数据表的数据通过hbase_hfile_table表保存为hfile

insert into table hbase_hfile_table select * from hive_data;
  • 查看对应hdfs路径是否生成了hfile

image

  1. 通过bulkload将数据导入到hbase表中

  • 使用阿里云hbase客户端创建具有上面对应family的hbase表

hbase(main):012:0> create 'hbase_hfile_load_table','cf_0'
  • 下载云hbase客户端,配置hbase-site.xml,并将hdfs-site.xml、core-site.xml拷贝到hbase/conf目录

 wget http://public-hbase.oss-cn-hangzhou.aliyuncs.com/installpackage/alihbase-1.1.4-bin.tar.gz .
 vi conf/hbase-site.xml <property>
         <name>hbase.zookeeper.quorum</name>
         <value>xxx</value>
 </property>
  • Execute bulkload to import into the hbase table

bin/hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://maste:port/tmp/hbase_table_hfile/  hbase_hfile_load_table
  • Check whether the data is imported in the hbase table hbase_hfile_load_table

image


Guess you like

Origin blog.51cto.com/15060465/2677284