1. hbase开发

1.1. 配置

HBaseConfiguration

包：org.apache.hadoop.hbase.HBaseConfiguration

作用：通过此类可以对HBase进行配置

用法实例：

Configuration config = HBaseConfiguration.create();

说明： HBaseConfiguration.create() 默认会从classpath 中查找 hbase-site.xml 中的配置信息，初始化 Configuration。

使用方法:

static Configuration config = null;
static {
config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "slave1,slave2,slave3");
config.set("hbase.zookeeper.property.clientPort", "2181");
}

1.2. 表管理类

HBaseAdmin

包：org.apache.hadoop.hbase.client.HBaseAdmin

作用：提供接口关系HBase 数据库中的表信息

用法：

HBaseAdmin admin = new HBaseAdmin(config);

1.3. 表描述类

HTableDescriptor

包：org.apache.hadoop.hbase.HTableDescriptor

作用：HTableDescriptor 类包含了表的名字以及表的列族信息

表的schema（设计）

用法：

HTableDescriptor htd =new HTableDescriptor(tablename);

htd.addFamily(new HColumnDescriptor(“myFamily”));

1.4. 列族的描述类

HColumnDescriptor

包：org.apache.hadoop.hbase.HColumnDescriptor

作用：HColumnDescriptor 维护列族的信息

用法：

htd.addFamily(new HColumnDescriptor(“myFamily”));

1.5. 创建表的操作

CreateTable（一般我们用shell创建表）

static Configuration config = null;

static {

     config = HBaseConfiguration.create();

     config.set("hbase.zookeeper.quorum", "slave1,slave2,slave3");

     config.set("hbase.zookeeper.property.clientPort", "2181");

}

HBaseAdmin admin = new HBaseAdmin(config);

HTableDescriptor desc = new HTableDescriptor(tableName);

HColumnDescriptor family1 = new HColumnDescriptor(“f1”);

HColumnDescriptor family2 = new HColumnDescriptor(“f2”);

desc.addFamily(family1);

desc.addFamily(family2);

admin.createTable(desc);

1.6. 删除表

HBaseAdmin admin = new HBaseAdmin(config);

admin.disableTable(tableName);

admin.deleteTable(tableName);

1.7. 创建一个表的类

HTable

包：org.apache.hadoop.hbase.client.HTable

作用：HTable 和 HBase 的表通信

用法：

// 普通获取表

HTable table = new HTable(config,Bytes.toBytes(tablename);

// 通过连接池获取表

Connection connection = ConnectionFactory.createConnection(config);

HTableInterface table = connection.getTable(TableName.valueOf("user"));

1.8. 单条插入数据

Put

包：org.apache.hadoop.hbase.client.Put

作用：插入数据

用法：

Put put = new Put(row);

p.add(family,qualifier,value);

说明：向表 tablename 添加 “family,qualifier,value”指定的值。

示例代码：

Connection connection = ConnectionFactory.createConnection(config);

HTableInterface table = connection.getTable(TableName.valueOf("user"));

Put put = new Put(Bytes.toBytes(rowKey));

put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier),Bytes.toBytes(value));

table.put(put);

1.9. 批量插入

批量插入

List<Put> list = new ArrayList<Put>();

Put put = new Put(Bytes.toBytes(rowKey));//获取put，用于插入

put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier),Bytes.toBytes(value));//封装信息

list.add(put);

table.put(list);//添加记录

1.10. 删除数据

Delete

包：org.apache.hadoop.hbase.client.Delete

作用：删除给定rowkey的数据

用法：

Delete del= new Delete(Bytes.toBytes(rowKey));

table.delete(del);

代码实例

Connection connection = ConnectionFactory.createConnection(config);

HTableInterface table = connection.getTable(TableName.valueOf("user"));

Delete del= new Delete(Bytes.toBytes(rowKey));

table.delete(del);

1.11. 单条查询

Get

包：org.apache.hadoop.hbase.client.Get

作用：获取单个行的数据

用法：

HTable table = new HTable(config,Bytes.toBytes(tablename));

Get get = new Get(Bytes.toBytes(row));

Result result = table.get(get);

说明：获取 tablename 表中 row 行的对应数据

代码示例：

Connection connection = ConnectionFactory.createConnection(config);

HTableInterface table = connection.getTable(TableName.valueOf("user"));

Get get = new Get(rowKey.getBytes());

Result row = table.get(get);

for (KeyValue kv : row.raw()) {

System.out.print(new String(kv.getRow()) + " ");

System.out.print(new String(kv.getFamily()) + ":");

System.out.print(new String(kv.getQualifier()) + " = ");

System.out.print(new String(kv.getValue()));

System.out.print(" timestamp = " + kv.getTimestamp() + "\n");

}

1.12. 批量查询

ResultScanner

包：org.apache.hadoop.hbase.client.ResultScanner

作用：获取值的接口

用法：

ResultScanner scanner = table.getScanner(scan);

For(Result rowResult : scanner){

        Bytes[] str = rowResult.getValue(family,column);

}

说明：循环获取行中列值。

代码示例：

Connection connection = ConnectionFactory.createConnection(config);

HTableInterface table = connection.getTable(TableName.valueOf("user"));

Scan scan = new Scan();

scan.setStartRow("a1".getBytes());

scan.setStopRow("a20".getBytes());

ResultScanner scanner = table.getScanner(scan);

for (Result row : scanner) {

System.out.println("\nRowkey: " + new String(row.getRow()));

for (KeyValue kv : row.raw()) {

     System.out.print(new String(kv.getRow()) + " ");

     System.out.print(new String(kv.getFamily()) + ":");

     System.out.print(new String(kv.getQualifier()) + " = ");

     System.out.print(new String(kv.getValue()));

     System.out.print(" timestamp = " + kv.getTimestamp() + "\n");

}

}

1.13. hbase过滤器

1.13.1. FilterList

FilterList 代表一个过滤器列表，可以添加多个过滤器进行查询，多个过滤器之间的关系有：

与关系（符合所有）：FilterList.Operator.MUST_PASS_ALL

或关系（符合任一）：FilterList.Operator.MUST_PASS_ONE

使用方法：

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);

Scan s1 = new Scan();

filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes(“f1”),  Bytes.toBytes(“c1”),  CompareOp.EQUAL,Bytes.toBytes(“v1”) )  );

filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes(“f1”),  Bytes.toBytes(“c2”),  CompareOp.EQUAL,Bytes.toBytes(“v2”) )  );

// 添加下面这一行后，则只返回指定的cell，同一行中的其他cell不返回

s1.addColumn(Bytes.toBytes(“f1”), Bytes.toBytes(“c1”));

s1.setFilter(filterList);  //设置filter

ResultScanner ResultScannerFilterList = table.getScanner(s1);  //返回结果列表

1.13.2. 过滤器的种类

过滤器的种类：

列植过滤器—SingleColumnValueFilter

      过滤列植的相等、不等、范围等

列名前缀过滤器—ColumnPrefixFilter

      过滤指定前缀的列名

多个列名前缀过滤器—MultipleColumnPrefixFilter

       过滤多个指定前缀的列名

rowKey过滤器—RowFilter

      通过正则，过滤rowKey值。

1.13.3. 列植过滤器—SingleColumnValueFilter

SingleColumnValueFilter 列值判断

相等 (CompareOp.EQUAL ),

不等(CompareOp.NOT_EQUAL),

范围 (e.g., CompareOp.GREATER)…………

下面示例检查列值和字符串'values' 相等...

SingleColumnValueFilter f = new SingleColumnValueFilter(

Bytes.toBytes("cFamily") Bytes.toBytes("column"), CompareFilter.CompareOp.EQUAL,

Bytes.toBytes("values"));

s1.setFilter(f);

注意：如果过滤器过滤的列在数据表中有的行中不存在，那么这个过滤器对此行无法过滤。

1.13.4. 列名前缀过滤器—ColumnPrefixFilter

过滤器—ColumnPrefixFilter

ColumnPrefixFilter 用于指定列名前缀值相等

ColumnPrefixFilter f = new ColumnPrefixFilter(Bytes.toBytes("values"));

s1.setFilter(f);

1.13.5. 多个列值前缀过滤器—MultipleColumnPrefixFilter

MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多，但可以指定多个前缀

byte[][] prefixes = new byte[][] {Bytes.toBytes("value1"),Bytes.toBytes("value2")};

Filter f = new MultipleColumnPrefixFilter(prefixes);

s1.setFilter(f);

1.13.6. rowKey过滤器—RowFilter

RowFilter 是rowkey过滤器

通常根据rowkey来指定范围时，使用scan扫描器的StartRow和StopRow方法比较好。

Filter f = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("^1234")); //匹配以1234开头的rowkey

s1.setFilter(f);

2. MapReduce操作Hbase

2.1. 实现方法

Hbase对MapReduce提供支持，它实现了TableMapper类和TableReducer类，我们只需要继承这两个类即可。

1、写个mapper继承TableMapper<Text, IntWritable>

参数：Text：mapper的输出key类型； IntWritable：mapper的输出value类型。

其中的map方法如下：

map(ImmutableBytesWritable key, Result value,Context context)

参数：key：rowKey；value： Result ，一行数据； context上下文

2、写个reduce继承TableReducer<Text, IntWritable, ImmutableBytesWritable>

参数：Text:reducer的输入key； IntWritable：reduce的输入value；

ImmutableBytesWritable：reduce输出到hbase中的rowKey类型。

其中的reduce方法如下：

reduce(Text key, Iterable<IntWritable> values,Context context)

参数： key：reduce的输入key；values：reduce的输入value；

2.2. 准备表

1、建立数据来源表‘word’，包含一个列族‘content’

向表中添加数据，在列族中放入列‘info’，并将短文数据放入该列中，如此插入多行，行键为不同的数据即可

2、建立输出表‘stat’，包含一个列族‘content’

3、通过Mr操作Hbase的‘word’表，对‘content：info’中的短文做词频统计，并将统计结果写入‘stat’表的‘content：info中’，行键为单词

2.3. 实现

package com.itcast.hbase;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
/**
 * mapreduce操作hbase
 * @author wilson
 *
 */
public class HBaseMr {
	/**
	 * 创建hbase配置
	 */
	static Configuration config = null;
	static {
		config = HBaseConfiguration.create();
		config.set("hbase.zookeeper.quorum", "slave1,slave2,slave3");
		config.set("hbase.zookeeper.property.clientPort", "2181");
	}
	/**
	 * 表信息
	 */
	public static final String tableName = "word";//表名1
	public static final String colf = "content";//列族
	public static final String col = "info";//列
	public static final String tableName2 = "stat";//表名2
	/**
	 * 初始化表结构，及其数据
	 */
	public static void initTB() {
		HTable table=null;
		HBaseAdmin admin=null;
		try {
			admin = new HBaseAdmin(config);//创建表管理
			/*删除表*/
			if (admin.tableExists(tableName)||admin.tableExists(tableName2)) {
				System.out.println("table is already exists!");
				admin.disableTable(tableName);
				admin.deleteTable(tableName);
				admin.disableTable(tableName2);
				admin.deleteTable(tableName2);
			}
			/*创建表*/
				HTableDescriptor desc = new HTableDescriptor(tableName);
				HColumnDescriptor family = new HColumnDescriptor(colf);
				desc.addFamily(family);
				admin.createTable(desc);
				HTableDescriptor desc2 = new HTableDescriptor(tableName2);
				HColumnDescriptor family2 = new HColumnDescriptor(colf);
				desc2.addFamily(family2);
				admin.createTable(desc2);
			/*插入数据*/
				table = new HTable(config,tableName);
				table.setAutoFlush(false);
				table.setWriteBufferSize(5);
				List<Put> lp = new ArrayList<Put>();
				Put p1 = new Put(Bytes.toBytes("1"));
				p1.add(colf.getBytes(), col.getBytes(),	("The Apache Hadoop software library is a framework").getBytes());
				lp.add(p1);
				Put p2 = new Put(Bytes.toBytes("2"));p2.add(colf.getBytes(),col.getBytes(),("The common utilities that support the other Hadoop modules").getBytes());
				lp.add(p2);
				Put p3 = new Put(Bytes.toBytes("3"));
				p3.add(colf.getBytes(), col.getBytes(),("Hadoop by reading the documentation").getBytes());
				lp.add(p3);
				Put p4 = new Put(Bytes.toBytes("4"));
				p4.add(colf.getBytes(), col.getBytes(),("Hadoop from the release page").getBytes());
				lp.add(p4);
				Put p5 = new Put(Bytes.toBytes("5"));
				p5.add(colf.getBytes(), col.getBytes(),("Hadoop on the mailing list").getBytes());
				lp.add(p5);
				table.put(lp);
				table.flushCommits();
				lp.clear();
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			try {
				if(table!=null){
					table.close();
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
	/**
	 * MyMapper 继承 TableMapper
	 * TableMapper<Text,IntWritable> 
	 * Text:输出的key类型，
	 * IntWritable：输出的value类型
	 */
	public static class MyMapper extends TableMapper<Text, IntWritable> {
		private static IntWritable one = new IntWritable(1);
		private static Text word = new Text();
		@Override
		//输入的类型为：key：rowKey； value：一行数据的结果集Result
		protected void map(ImmutableBytesWritable key, Result value,
				Context context) throws IOException, InterruptedException {
			//获取一行数据中的colf：col
			String words = Bytes.toString(value.getValue(Bytes.toBytes(colf), Bytes.toBytes(col)));// 表里面只有一个列族，所以我就直接获取每一行的值
			//按空格分割
			String itr[] = words.toString().split(" ");
			//循环输出word和1
			for (int i = 0; i < itr.length; i++) {
				word.set(itr[i]);
				context.write(word, one);
			}
		}
	}
	/**
	 * MyReducer 继承 TableReducer
	 * TableReducer<Text,IntWritable> 
	 * Text:输入的key类型，
	 * IntWritable：输入的value类型，
	 * ImmutableBytesWritable：输出类型，表示rowkey的类型
	 */
	public static class MyReducer extends
			TableReducer<Text, IntWritable, ImmutableBytesWritable> {
		@Override
		protected void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			//对mapper的数据求和
			int sum = 0;
			for (IntWritable val : values) {//叠加
				sum += val.get();
			}
			// 创建put，设置rowkey为单词
			Put put = new Put(Bytes.toBytes(key.toString()));
			// 封装数据
			put.add(Bytes.toBytes(colf), Bytes.toBytes(col),Bytes.toBytes(String.valueOf(sum)));
			//写到hbase,需要指定rowkey、put
			context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())),put);
		}
	}
	
	public static void main(String[] args) throws IOException,
			ClassNotFoundException, InterruptedException {
		config.set("df.default.name", "hdfs://master:9000/");//设置hdfs的默认路径
		config.set("hadoop.job.ugi", "hadoop,hadoop");//用户名，组
		config.set("mapred.job.tracker", "master:9001");//设置jobtracker在哪
		//初始化表
		initTB();//初始化表
		//创建job
		Job job = new Job(config, "HBaseMr");//job
		job.setJarByClass(HBaseMr.class);//主类
		//创建scan
		Scan scan = new Scan();
		//可以指定查询某一列
		scan.addColumn(Bytes.toBytes(colf), Bytes.toBytes(col));
		//创建查询hbase的mapper，设置表名、scan、mapper类、mapper的输出key、mapper的输出value
		TableMapReduceUtil.initTableMapperJob(tableName, scan, MyMapper.class,Text.class, IntWritable.class, job);
		//创建写入hbase的reducer，指定表名、reducer类、job
		TableMapReduceUtil.initTableReducerJob(tableName2, MyReducer.class, job);
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

18、大数据之HBase开发

1. hbase开发

1.1. 配置

1.3. 表描述类

1.4. 列族的描述类

1.5. 创建表的操作

1.6. 删除表

1.7. 创建一个表的类

1.8. 单条插入数据

1.9. 批量插入

1.10. 删除数据

1.11. 单条查询

1.12. 批量查询

1.13. hbase过滤器

1.13.1. FilterList

1.13.2. 过滤器的种类

1.13.3. 列植过滤器—SingleColumnValueFilter

1.13.4. 列名前缀过滤器—ColumnPrefixFilter

1.13.5. 多个列值前缀过滤器—MultipleColumnPrefixFilter

1.13.6. rowKey过滤器—RowFilter

2. MapReduce操作Hbase

2.1. 实现方法

2.2. 准备表

2.3. 实现

猜你喜欢