HBase(8):hbase与MapReduce集成案例

一、实现功能

1.importtsv将tsv导入hbase

2.importtsv将csv导入hbase

3.importtsv通过completebulkload导入hfile的文件数据

二、实例准备

1.需求

stu_info有20列数据,将info下name这列数据读出来,然后写入另一张表tb02。

2.hbase新建两张表

create 'stu_info','info','degree','work'
create 'tb02','info'

3.插入数据

put 'stu_info','10001','degree:xueli','benke'
put 'stu_info','10001','info:age','18'
put 'stu_info','10001','info:sex','male'
put 'stu_info','10001','info:name','tom'
put 'stu_info','10001','work:job','bigdata'
put 'stu_info','10002','degree:xueli','gaozhong'
put 'stu_info','10002','info:age','22'
put 'stu_info','10002','info:sex','female'
put 'stu_info','10002','info:name','jack'
put 'stu_info','10003','info:age','22'
put 'stu_info','10003','info:name','leo'
put 'stu_info','10004','info:age','18'
put 'stu_info','10004','info:name','peter'
put 'stu_info','10005','info:age','19'
put 'stu_info','10005','info:name','jim'
put 'stu_info','10006','info:age','20'
put 'stu_info','10006','info:name','zhangsan'

三、不同功能实现

1.importtsv 使用

(1)需求:将tsv的格式文件导入到表stu_info,创建tsv

vi hbase-test.tsv

(2)输入hbase-test.tsv数据

10001   ngsan   12      male
10002   lisi    13      female
10003   wangwu  14      male
10004   zhaoliu 15      female
10005   xieqi   16      female

(3)上传到hdfs上

bin/hdfs dfs -put /opt/datas/hbase-test.tsv /hadoop

(4)运行:将hdfs上 /hbase-test.tsv 导入stu_info表

bin/yarn jar /opt/modules/hbase-1.2.0-cdh5.7.0/lib/hbase-server-1.2.0-cdh5.7.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex stu_info /hadoop/hbase-test.tsv

(5)查看结果

 ROW                       COLUMN+CELL                                                             
 10001                    column=degree:xueli, timestamp=1543583981287, value=benke               
 10001                    column=info:age, timestamp=1543583981314, value=18                      
 10001                    column=info:name, timestamp=1543583981366, value=tom                    
 10001                    column=info:sex, timestamp=1543583981340, value=male                    
 10001                    column=work:job, timestamp=1543583981381, value=bigdata                 
 10002                    column=degree:xueli, timestamp=1543583981396, value=gaozhong            
 10002                    column=info:age, timestamp=1543583981410, value=22                      
 10002                    column=info:name, timestamp=1543583981438, value=jack                   
 10002                    column=info:sex, timestamp=1543583981425, value=female                  
 10003                    column=info:age, timestamp=1543583981457, value=22                      
 10003                    column=info:name, timestamp=1543583981484, value=leo                    
 10004                    column=info:age, timestamp=1543583981497, value=18                      
 10004                    column=info:name, timestamp=1543583981509, value=peter                  
 10005                    column=info:age, timestamp=1543583981533, value=19                      
 10005                    column=info:name, timestamp=1543583981547, value=jim                    
 10006                    column=info:age, timestamp=1543583981559, value=20                      
 10006                    column=info:name, timestamp=1543583982459, value=zhangsan   

2.将csv的格式文件导入到表stu_info

(1)创建csv文件hbase-test2.csv

10011,ngsan,12,male
10012,lisi,13,female
10013,wangwu,14,male
10014,zhaoliu,15,female
10015,xieqi,16,female

(2)上传hdfs

bin/hdfs dfs -put /opt/datas/hbase-test2.csv /hadoop

(3)运行

$HADOOP_HOME/bin/yarn jar /opt/modules/hbase-1.2.0-cdh5.7.0/lib/hbase-server-1.2.0-cdh5.7.0.jar importtsv -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex stu_info /hadoop/hbase-test2.csv

(4)查看结果

ROW                       COLUMN+CELL                                                             
10001                    column=degree:xueli, timestamp=1543583981287, value=benke               
10001                    column=info:age, timestamp=1543583981314, value=18                      
10001                    column=info:name, timestamp=1543583981366, value=tom                    
10001                    column=info:sex, timestamp=1543583981340, value=male                    
10001                    column=work:job, timestamp=1543583981381, value=bigdata                 
10002                    column=degree:xueli, timestamp=1543583981396, value=gaozhong            
10002                    column=info:age, timestamp=1543583981410, value=22                      
10002                    column=info:name, timestamp=1543583981438, value=jack                   
10002                    column=info:sex, timestamp=1543583981425, value=female                  
10003                    column=info:age, timestamp=1543583981457, value=22                      
10003                    column=info:name, timestamp=1543583981484, value=leo                    
10004                    column=info:age, timestamp=1543583981497, value=18                      
10004                    column=info:name, timestamp=1543583981509, value=peter                  
10005                    column=info:age, timestamp=1543583981533, value=19                      
10005                    column=info:name, timestamp=1543583981547, value=jim                    
10006                    column=info:age, timestamp=1543583981559, value=20                      
10006                    column=info:name, timestamp=1543583982459, value=zhangsan               
10011                    column=info:age, timestamp=1543585629390, value=12                      
10011                    column=info:name, timestamp=1543585629390, value=ngsan                  
10011                    column=info:sex, timestamp=1543585629390, value=male                    
10012                    column=info:age, timestamp=1543585629390, value=13                      
10012                    column=info:name, timestamp=1543585629390, value=lisi                   
10012                    column=info:sex, timestamp=1543585629390, value=female                  
10013                    column=info:age, timestamp=1543585629390, value=14                      
10013                    column=info:name, timestamp=1543585629390, value=wangwu                 
10013                    column=info:sex, timestamp=1543585629390, value=male                    
10014                    column=info:age, timestamp=1543585629390, value=15                      
10014                    column=info:name, timestamp=1543585629390, value=zhaoliu                
10014                    column=info:sex, timestamp=1543585629390, value=female                  
10015                    column=info:age, timestamp=1543585629390, value=16                      
10015                    column=info:name, timestamp=1543585629390, value=xieqi                  
10015                    column=info:sex, timestamp=1543585629390, value=female 

3.completebulkload 导入hfile的文件数据

(1)实现步骤

第一步:导入hfile的文件数据,可以将执行的mapreduce任务和导入操作进行解耦
第二步:可以利用集群资源的低谷区将数据文件转化成对应的表的hifle文件,需要映射就可以完成快速导入

(2)首先将数据文件转换为storefile文件(hfile格式),存储到hdfs的路径 -Dimporttsv.bulk.output

(3)创建数据hbase-test3.csv

10011,ngsan,12,male
10012,lisi,13,female
10013,wangwu,14,male
10014,zhaoliu,15,female
10015,xieqi,16,female

(4)上传hdfs

/opt/modules/apache/hadoop-2.7.3/bin/hdfs dfs -put /opt/datas/hbase-test3.csv /hbase-test3.csv

(5)运行:数据源上传到/testHfile下面

export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
export HADOOP_HOME=/opt/modules/apache/hadoop-2.7.3
HADOOP_CLASSPATH=/opt/modules/hbase-0.98.6-hadoop2/lib/*
$HADOOP_HOME/bin/yarn jar /opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex -Dimporttsv.bulk.output=/testHfile stu_info /hbase-test3.csv

解释:
    -》-Dimporttsv.bulk.output=/testHfile     hfile的hdfs目录
    -》stu_info    对应那个表的hfile
    -》/hbase-test3.csv 数据源

(6)再将hfile文件导入到hbase

此时,/testHfile下面的文件,会被转移至hbase对应列簇下。相当于移动操作

bin/yarn jar /opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar completebulkload /testHfile stu_info

解释:    
        -》指定hdfs数据源 /testHfile
        -》制定表 stu_info
 

猜你喜欢

转载自blog.csdn.net/u010886217/article/details/84677481