第6章 HBase&MR集成官方案例

首先，需要做的准备环境配置：
先进入到hadoop的配置文件下的hadoop-en.sh文件添加参数

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hadoop/module/hbase-1.3.1/lib/*

添加参数如图所示：
在这里插入图片描述
配置完毕之后，我们需要关闭hadoop的所有的服务，重新启动，执行命令如下：

[root@hadoop105 hadoop]# stop-all.sh
[root@hadoop105 hadoop]# start-all.sh

关闭hbase服务，重新启动：

[root@hadoop105 hbase-1.3.1]# bin/stop-hbase.sh 
[root@hadoop105 hbase-1.3.1]# bin/start-hbase.sh

进入hbase的客户端服务：

[root@hadoop105 hbase-1.3.1]# bin/hbase shell

#查看数据表有哪些
hbase(main):001:0> list
TABLE                                                                                                                                                                      
atguigu:student                                                                                                                                                            
atguigu:user                                                                                                                                                               
student                                                                                                                                                                    
3 row(s) in 0.4370 seconds

=> ["atguigu:student", "atguigu:user", "student"]
hbase(main):002:0> 

#查看student数据表有几条数据（结果2条数据）
hbase(main):005:0> count 'student'
2 row(s) in 0.0380 seconds

=> 2

1、运行官方的MapReduce任务

（1）案例一：统计Student表中有多少行数据

[root@hadoop105 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar ./lib/hbase-server-1.3.1.jar rowcounter student


#打印出数据有2条
org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
ROWS=2

（2）案例二：使用MapReduce将本地数据导入到HBase
首先，在本地创建一个tsv格式的文件：fruit.tsv
编辑文本，添加数据：

[root@hadoop105 hbase-1.3.1]# touch fruit.tsv
[root@hadoop105 hbase-1.3.1]# vim fruit.tsv 
1001	Apple	Red
1002	Pear		Yellow
1003	Pineapple	Yellow

接下来，我们把这个数据上传到HDFS的根目录上

[root@hadoop105 hbase-1.3.1]# hadoop fs -put fruit.tsv /

上传进来了
在这里插入图片描述
（3）执行MapReduce到HBase的fruit表中

[root@hadoop105 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar  importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop105:9000/fruit.tsv

执行后出错：
在这里插入图片描述
解决方法：创建fruit这张表

hbase(main):006:0> create 'fruit','info'
0 row(s) in 2.6830 seconds

=> Hbase::Table - fruit
hbase(main):007:0>

重新执行：

[root@hadoop105 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar  importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop105:9000/fruit.tsv

这样表示任务提交进来了
在这里插入图片描述
查看当前数据：

#查看当前数据表有哪些
hbase(main):007:0> list
TABLE                                                                                                                                                                      
atguigu:student                                                                                                                                                            
atguigu:user                                                                                                                                                               
fruit                                                                                                                                                                      
student                                                                                                                                                                    
4 row(s) in 0.0720 seconds

=> ["atguigu:student", "atguigu:user", "fruit", "student"]

#扫描fruit数据表（使用scan命令查看导入后的结果）
hbase(main):008:0> scan 'fruit'
ROW                                         COLUMN+CELL                                                                                                                    
 1001                                       column=info:color, timestamp=1579890743538, value=Red                                                                          
 1001                                       column=info:name, timestamp=1579890743538, value=Apple                                                                         
 1003                                       column=info:color, timestamp=1579890743538, value=Yellow                                                                       
 1003                                       column=info:name, timestamp=1579890743538, value=Pineapple                                                                     
2 row(s) in 0.3170 seconds

欧克！hadoop与MR交互没问题

2、自定义HBase-MapReduce1

目标：将fruit表中的一部分数据，通过MR迁入到fruit_mr表中。

分步实现：

项目工程构建：
在这里插入图片描述
代码编写：

FruitMapper.java

具体代码实现：

package com.study.mrl;


import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class FruitMapper extends TableMapper<ImmutableBytesWritable, Put> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws
            IOException, InterruptedException {

        //构建Put对象
        Put put=new Put(key.get());
        //遍历数据
        Cell[] cells = value.rawCells();
        for (Cell cell :cells){
            if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
                put.add(cell);
            }

        }
         context.write(key,put);
        }

    }

FruitReducer.java

具体代码实现：

package com.study.mrl;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

import java.io.IOException;

public class FruitReducer extends TableReducer <ImmutableBytesWritable, Put, NullWritable>{
    @Override
    protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws
            IOException, InterruptedException {
       //遍历写出
        for (Put value:values){
            context.write(NullWritable.get(),value);
        }

    }
}

FruitDriver.java

具体代码实现：

package com.study.mrl;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class FruitDriver extends Configuration implements Tool {
    private Configuration configuration=null;

    public int run(String[] args) throws Exception {
        //获取任务对象
        Job job = Job.getInstance(configuration);

        //指定driver类
       job.setJarByClass(FruitDriver.class);

        //指定Mapper
        TableMapReduceUtil.initTableMapperJob("fruit",
                new Scan(),
                FruitMapper.class,
                ImmutableBytesWritable.class,
                Put.class,
                job);

        //指定Reducer
        TableMapReduceUtil.initTableReducerJob("fruit_mr",
                FruitReducer.class,
                job);

        //提交
        boolean b = job.waitForCompletion(true);

        return b?0:1;
    }

    public void setConf(Configuration conf) {
       this.configuration= conf;
    }

    public Configuration getConf() {
        return null;
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = HBaseConfiguration.create();
        int i = ToolRunner.run(configuration, new FruitDriver(), args);
    }
}

3、自定义MR打包测试

步骤图：

在这里插入图片描述

自己是把打包好的文件上传到/usr/local/hadoop/module/hbase-1.3.1文件下：

执行命令之前需要创建表：fruit_mr

hbase(main):009:0> create 'fruit_mr','info'
0 row(s) in 3.1550 seconds

=> Hbase::Table - fruit_mr

执行命令：

[root@hadoop107 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar  HB20201024-1.0-SNAPSHOT.jar com.study.mrl.FruitDriver

.......
Physical memory (bytes) snapshot=306782208
Virtual memory (bytes) snapshot=4175613952
Total committed heap usage (bytes)=141197312
HBase Counters
.........
.........

执行成功！

接下来扫描数据：

#从没有数据到 有数据
hbase(main):010:0> scan 'fruit_mr'
ROW                                         COLUMN+CELL                                                                                                                    
 1001                                       column=info:name, timestamp=1579890743538, value=Apple                                                                         
 1003                                       column=info:name, timestamp=1579890743538, value=Pineapple                                                                     
2 row(s) in 0.2820 seconds

江湖侠客

发布了131 篇原创文章 · 获赞 18 · 访问量 2550

私信关注