Hbase -MR

版权声明:1911907658 https://blog.csdn.net/qq_33598343/article/details/84947567

需求1:对一张表的rowkey进行计数

通过yarn调用hbase/lib下给我们提供的rowcounter程序

1)查看包
在hbase/bin/hbase mapredcp 可查看需要的包
2)导包
export HBASE_HOME=/opt/hbase
export HADOOP_HOME=/opt/hadoop

export HADOOP_CLASSPATH = `${HBASE_HOME}/bin/hbase mapredcp` 
 这个`是Tab上面那个
环境变量可以添加到:hbase-env.sh,我们上面的是临时的。

3)cd /opt/hbase
/root/hd/hadoop-2.8.4/bin/yarn jar lib/hbase-server-1.3.0.jar rowcounter
a

需求2:本地数据导入到hbase中
思路?hbase底层存储是hdfs,把数据先导入到hdfs
hbase对应创建一张表
利用mr导入数据到表中

数据导入

数据:
b.tsv
001 xiaoming henshuai
002 xdaer good
003 wanger good

后缀名为tsv
1)上传数据文件
hdfs dfs -mkdir / love
hdfs dfs -put b.tsv / love

2)创建表
create ‘love’,‘info’

3)导入操作
cd /opt/hbase
/opt/hadoop/hadoop-2.8.4/bin/yarn jar lib/hbase-server-1.3.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:describe love hdfs://master:9000/love

需求3:将hbase中love表进行指定列的筛选然后倒入到lovemr表
1)构建Mapper类,读取love表中数据
2)构建Reducer类,将love表中数据写入到lovemr表中
3)构建driver驱动类
4) 打包 放入集群中运行这个任务

Map:

public class ReadLoveMapper extends TableMapper<ImmutableBytesWritable,Put> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
        //1.读取数据 拿到一个rowkey的数据
        Put put = new Put(key.get());

        //2.遍历column
        for(Cell c:value.rawCells()){
            //3.加入列族数据 当前列族是info要 不是info列族的不要 是info数据才导入lovemr表中
            if("info".equals(Bytes.toString(CellUtil.cloneFamily(c)))){
                //4.拿到指定列的数据
                if("name".equals(Bytes.toString(CellUtil.cloneQualifier(c)))){
                    put.add(c);
                }
            }
        }
        context.write(key,put);
    }
}

Reducer:

public class WriteLoveReducer extends TableReducer<ImmutableBytesWritable,Put,NullWritable> {
    @Override
    protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException, IOException {
        for(Put p:values){
            //遍历数据
            context.write(NullWritable.get(),p);
        }
    }
}

Drive:

public class LoveDriver implements Tool {

    private Configuration conf;

    //业务逻辑
    public int run(String[] strings) throws Exception {
        //1.创建任务
        Job job = Job.getInstance(conf);
        //2.指定运行的主类
        job.setJarByClass(LoveDriver.class);
        //3.配置job 采用scan方式扫描该表
        Scan scan = new Scan();

        //4.设置mapper类
        TableMapReduceUtil.initTableMapperJob("love",
                scan,
                ReadLoveMapper.class,
                ImmutableBytesWritable.class,
                Put.class,
                job);

        //5.设置reducer类
        TableMapReduceUtil.initTableReducerJob("lovemr",
                WriteLoveReducer.class,
                job
        );

        //设置reduceTask个数
        job.setNumReduceTasks(1);

        boolean rs = job.waitForCompletion(true);
        return rs?0:1;
    }

    //设置配置
    public void setConf(Configuration configuration) {
        this.conf = HBaseConfiguration.create(configuration);
    }
    //拿到配置
    public Configuration getConf() {

        return this.conf;
    }

    public static void main(String[] args){
        try {
            int status = ToolRunner.run(new LoveDriver(),args);
            System.exit(status);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

执行/opt/hadoop/hadoop-2.8.4/bin/yarn jar Hbase-1.0-SNAPSHOT.jar com.itstaredu.hbasemr.LoveDriver

猜你喜欢

转载自blog.csdn.net/qq_33598343/article/details/84947567
MR