首先,需要做的准备环境配置:
先进入到hadoop的配置文件下的hadoop-en.sh文件添加参数
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hadoop/module/hbase-1.3.1/lib/*
添加参数如图所示:
配置完毕之后,我们需要关闭hadoop的所有的服务,重新启动,执行命令如下:
[root@hadoop105 hadoop]# stop-all.sh
[root@hadoop105 hadoop]# start-all.sh
关闭hbase服务,重新启动:
[root@hadoop105 hbase-1.3.1]# bin/stop-hbase.sh
[root@hadoop105 hbase-1.3.1]# bin/start-hbase.sh
进入hbase的客户端服务:
[root@hadoop105 hbase-1.3.1]# bin/hbase shell
#查看数据表有哪些
hbase(main):001:0> list
TABLE
atguigu:student
atguigu:user
student
3 row(s) in 0.4370 seconds
=> ["atguigu:student", "atguigu:user", "student"]
hbase(main):002:0>
#查看student数据表有几条数据(结果2条数据)
hbase(main):005:0> count 'student'
2 row(s) in 0.0380 seconds
=> 2
1、运行官方的MapReduce任务
(1)案例一:统计Student表中有多少行数据
[root@hadoop105 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar ./lib/hbase-server-1.3.1.jar rowcounter student
#打印出数据有2条
org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
ROWS=2
(2)案例二:使用MapReduce将本地数据导入到HBase
首先,在本地创建一个tsv格式的文件:fruit.tsv
编辑文本,添加数据:
[root@hadoop105 hbase-1.3.1]# touch fruit.tsv
[root@hadoop105 hbase-1.3.1]# vim fruit.tsv
1001 Apple Red
1002 Pear Yellow
1003 Pineapple Yellow
接下来,我们把这个数据上传到HDFS的根目录上
[root@hadoop105 hbase-1.3.1]# hadoop fs -put fruit.tsv /
上传进来了
(3)执行MapReduce到HBase的fruit表中
[root@hadoop105 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop105:9000/fruit.tsv
执行后出错:
解决方法:创建fruit这张表
hbase(main):006:0> create 'fruit','info'
0 row(s) in 2.6830 seconds
=> Hbase::Table - fruit
hbase(main):007:0>
重新执行:
[root@hadoop105 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop105:9000/fruit.tsv
这样表示任务提交进来了
查看当前数据:
#查看当前数据表有哪些
hbase(main):007:0> list
TABLE
atguigu:student
atguigu:user
fruit
student
4 row(s) in 0.0720 seconds
=> ["atguigu:student", "atguigu:user", "fruit", "student"]
#扫描fruit数据表(使用scan命令查看导入后的结果)
hbase(main):008:0> scan 'fruit'
ROW COLUMN+CELL
1001 column=info:color, timestamp=1579890743538, value=Red
1001 column=info:name, timestamp=1579890743538, value=Apple
1003 column=info:color, timestamp=1579890743538, value=Yellow
1003 column=info:name, timestamp=1579890743538, value=Pineapple
2 row(s) in 0.3170 seconds
欧克!hadoop与MR交互没问题
2、自定义HBase-MapReduce1
目标:将fruit表中的一部分数据,通过MR迁入到fruit_mr表中。
分步实现:
项目工程构建:
代码编写:
FruitMapper.java
具体代码实现:
package com.study.mrl;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class FruitMapper extends TableMapper<ImmutableBytesWritable, Put> {
@Override
protected void map(ImmutableBytesWritable key, Result value, Context context) throws
IOException, InterruptedException {
//构建Put对象
Put put=new Put(key.get());
//遍历数据
Cell[] cells = value.rawCells();
for (Cell cell :cells){
if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
put.add(cell);
}
}
context.write(key,put);
}
}
FruitReducer.java
具体代码实现:
package com.study.mrl;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;
import java.io.IOException;
public class FruitReducer extends TableReducer <ImmutableBytesWritable, Put, NullWritable>{
@Override
protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws
IOException, InterruptedException {
//遍历写出
for (Put value:values){
context.write(NullWritable.get(),value);
}
}
}
FruitDriver.java
具体代码实现:
package com.study.mrl;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class FruitDriver extends Configuration implements Tool {
private Configuration configuration=null;
public int run(String[] args) throws Exception {
//获取任务对象
Job job = Job.getInstance(configuration);
//指定driver类
job.setJarByClass(FruitDriver.class);
//指定Mapper
TableMapReduceUtil.initTableMapperJob("fruit",
new Scan(),
FruitMapper.class,
ImmutableBytesWritable.class,
Put.class,
job);
//指定Reducer
TableMapReduceUtil.initTableReducerJob("fruit_mr",
FruitReducer.class,
job);
//提交
boolean b = job.waitForCompletion(true);
return b?0:1;
}
public void setConf(Configuration conf) {
this.configuration= conf;
}
public Configuration getConf() {
return null;
}
public static void main(String[] args) throws Exception {
Configuration configuration = HBaseConfiguration.create();
int i = ToolRunner.run(configuration, new FruitDriver(), args);
}
}
3、自定义MR打包测试
步骤图:
自己是把打包好的文件上传到/usr/local/hadoop/module/hbase-1.3.1文件下:
执行命令之前需要创建表:fruit_mr
hbase(main):009:0> create 'fruit_mr','info'
0 row(s) in 3.1550 seconds
=> Hbase::Table - fruit_mr
执行命令:
[root@hadoop107 hbase-1.3.1]# /usr/local/hadoop/module/hadoop-2.7.2/bin/yarn jar HB20201024-1.0-SNAPSHOT.jar com.study.mrl.FruitDriver
.......
Physical memory (bytes) snapshot=306782208
Virtual memory (bytes) snapshot=4175613952
Total committed heap usage (bytes)=141197312
HBase Counters
.........
.........
执行成功!
接下来扫描数据:
#从没有数据到 有数据
hbase(main):010:0> scan 'fruit_mr'
ROW COLUMN+CELL
1001 column=info:name, timestamp=1579890743538, value=Apple
1003 column=info:name, timestamp=1579890743538, value=Pineapple
2 row(s) in 0.2820 seconds