Columnar table in Hbase maps to hive's outer table

When doing data ETL, the original data may be stored in columnar storage Hbase. At this time, if we want to clean the data, we can consider mapping the Hbase table to the Hive table, and then use Hive's HQL to clear and process the data. For the specific process, please refer to the following example:

step

1. Create Hbase table 
2. Map Hive table

step one

Description: cf column cluster name, only put a few test columns 
create 'cofeed_info',{NAME => 'cf', REPLICATION_SCOPE => 1} 
put 'cofeed_info', '100001', 'cf:id', '101' 
put 'cofeed_info', '100001', 'cf:title', 'This is test data' 
put 'cofeed_info', '100001', 'cf:insert_time', '45679848161564'

Step 2

Description: Although many columns are not currently in the Hbase table, it does not matter, :key means rowkey 
CREATE EXTERNAL TABLE cofeed_info 

rowkey string, 
id string, 
title string, 
tourl string, 
content string, 
data_provider string, 
b_class string, 
b_cadogory string, 
source string, 
insert_time timestamp, 
dt string 
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
SERDEPROPERTIES (“hbase.columns.mapping” = 
“:key, 
cf:id, 
cf:title, 
cf:tourl, 
cf:content, 
cf:data_provider, 
cf:b_class, 
cf:b_cadogory, 
cf:source, 
cf:insert_time, 
cf:dt”) TBLPROPERTIES (“hbase.table.name” = “cofeed_info”);

result

hive> desc cofeed_info; 
OK 
rowkey string from deserializer 
id string from deserializer 
title string from deserializer 
tourl string from deserializer 
content string from deserializer 
data_provider string from deserializer 
b_class string from deserializer 
b_catogory string from deserializer 
source string from deserializer 
insert_time timestamp from deserializer 
dt string from deserializer 
说明:Hbase中没有的列簇为NULL了. 
hive> select * from cofeed_info; 
OK 
100001 101 这是测试用的数据 NULL NULL NULL NULL NULL NULL NULL NULL


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325697513&siteId=291194637