HBase(二)之常用操作及读写数据原理
HBase 命令
HBase中保存的都是二进制数据
1. 客户端进出命令
# 进入客户端:
hbase shell
# 退出客户端命令:
quit
# 帮助
help
2. namespace操作
注:默认存在一个default的namespace
#1. 查看namespace
list_namespace
#2. 创建namespace
create_namespace "命名空间名字"
#3. 删除namespace
drop_namespace "命令空间名字"
3. 表操作
# 1. 查看所有表
hbase(main):024:0> list
TABLE
user_namespace:user # namespace:表
t_person # default:表 default被省略了
2 row(s) in 0.1140 seconds
# 2. 查看某个namespace下的所有表
hbase(main):027:0> list_namespace_tables "user_namespace"
TABLE
user
1 row(s) in 0.3970 seconds
# 3. 创建表
语法:create "namespace:表名","列族1","列族2"
hbase(main):023:0> create "user_namespace:user","info","edu"
0 row(s) in 9.9000 seconds
# 4. 查看表结构
hbase(main):030:0> desc "user_namespace:user"
Table user_namespace:user is ENABLED
user_namespace:user
COLUMN FAMILIES DESCRIPTION
{
NAME => 'edu', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =>
'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{
NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE'
, DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 1.6400 seconds
# 5. 删除表和禁用表
hbase(main):002:0> disable "namespace:表"
0 row(s) in 4.4790 seconds
hbase(main):002:0> drop "namespace:表"
0 row(s) in 4.4790 seconds
4.数据增删改查
# 1. 添加数据(每次只能添加一个列)
put "namespace:表","rowkey","列族1:列名1","值"
# 2. 根据rowkey查找数据
get "namespace:表名","rowkey"
# 3. 根据rowkey和列族查找数据
get "namespace:表名","rowkey","列族:列"
# 4. scan 查询表中所有数据
hbase(main):019:0> scan "user_namespace:user"
ROW COLUMN+CELL
1001 column=info:age, timestamp=1586790192297, value=18
1001 column=info:name, timestamp=1586790138031, value=zhangsan1
1002 column=info:age, timestamp=1586790893380, value=20
1002 column=info:name, timestamp=1586790884872, value=zhangsan2
# 5. scan 查询表中前2条数据
hbase(main):022:0> scan "user_namespace:user",{
LIMIT=>2}
ROW COLUMN+CELL
1001 column=info:age, timestamp=1586790192297, value=18
1001 column=info:name, timestamp=1586790138031, value=zhangsan1
1002 column=info:age, timestamp=1586790893380, value=20
1002 column=info:name, timestamp=1586790884872, value=zhangsan2
1 row(s) in 0.5400 seconds
# 6. 使用start row 和 end row 范围查找
hbase(main):029:0> scan "user_namespace:user",{
STARTROW=>"1001",ENDROW=>"1002"}
ROW COLUMN+CELL
1001 column=info:age, timestamp=1586790192297, value=18
1001 column=info:name, timestamp=1586790138031, value=zhangsan1
1 row(s) in 0.4420 seconds
# 7. 使用start row和limit查找
hbase(main):032:0> scan "user_namespace:user",{
STARTROW=>"1001",LIMIT=>2}
ROW COLUMN+CELL
1001 column=info:age, timestamp=1586790192297, value=18
1001 column=info:name, timestamp=1586790138031, value=zhangsan1
1002 column=info:age, timestamp=1586790893380, value=20
1002 column=info:name, timestamp=1586790884872, value=zhangsan2
# 8. 修改数据(本质上是覆盖)
put "namespace:表","rowkey","列族:列名","值"
# 9. 删除数据(删除某个cell)
delete "namespace:表","rowkey","列族:列名"
# 10. 删除某个rowkey对应的数据
deleteall "namespace:表","rowkey"
# 11. 统计表中所有数据
count "namespace:表"
5. 多版本问题
# 1. 创建表
hbase(main):013:0> create "user_namespace:user","info"
# 2. 修改版本数
hbase(main):016:0> alter "user_namespace:user",{NAME=>'info',VERSIONS=>2}
# 3. 同一个cell添加2次数据
hbase(main):014:0> put "user_namespace:user","1001","info:name","aaa"
0 row(s) in 0.2620 seconds
hbase(main):015:0> put "user_namespace:user","1001","info:name","bb"
0 row(s) in 0.0290 seconds
# 4. 查看多版本
hbase(main):017:0> get "user_namespace:user","1001",{COLUMN=>'info:name',VERSIONS=>3}
COLUMN CELL
info:name timestamp=1586795010367, value=bb
info:name timestamp=1586795004085, value=aaa
# 表的列族的VERSIONS=>2表示的该列族的数据,要保存2个版本。如果put3次,则保留最新的版本。
HBase API
环境准备
-
依赖
<properties> <hbase.version>1.5.0</hbase.version> </properties> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-common</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-protocol</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>${hbase.version}</version> </dependency> <!--如果出现 jacksonmapper的异常--> <dependency> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-mapper-asl</artifactId> <version>1.9.13</version> </dependency>
-
初始化配置
将hbase中的conf中的 hbase-site.xml放到resource配置文件目录中。
conf.addResource("/hbase-site.xml")
-
windows配置ip映射
API介绍
API | 含义 | 创建 |
---|---|---|
Configuration | 配置文件 | HBaseConfiguration.create(); |
Connection | 连接,用来操作数据 | ConnectionFactory.createConnection(conf); |
Admin | 客户端,用来操作元数据 (namespace和table结构) |
conn.getAdmin(); |
NamespaceDescriptor | 命名空间相当于database | NamespaceDescriptor.create(“user_namespace”).build(); |
TableName | 表名 | TableName.valueOf(“user_namespace:user”); |
HTableDescriptor | 表 | new HTableDescriptor(tablename); |
HColumnDescriptor | 列族 | new HColumnDescriptor(“info”); |
Put | 添加数据 | new Put(Bytes.toBytes(“1001”)); |
Delete | rowkey的删除条件 | new Delete(Bytes.toBytes(“1001”)); |
Get | scan多行查询器 | new Get(Bytes.toBytes(“1001”)); |
Scan | scan多行查询器 | new Scan(); |
Result | 查询结果集(单条结果) | table.get(get); |
ResultScanner | 查询结果集(N条结果) | table.getScanner(scan); |
Bytes | 类型转化工具类,HBase中数据类型为字节, 所有类型存入后都变成字节,需要相互转化。 |
HBase客户端连接
//获得客户端
//1.读取配置文件
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","192.168.242.30");
//打印日志信息
BasicConfigurator.configure();
//2.建立连接
Connection conn = ConnectionFactory.createConnection(conf);
//3.获得客户端
Admin admin = connection.getAdmin();
//4.释放资源
admin.close();
常用API
1. 创建namespace
//1.构建namespace信息
NamespaceDescriptor namespace = NamespaceDescriptor.create("student_namespace").build();
//2.创建namespace
admin.createNamespace(namespace);
2. 表操作
操作表,使用Admin
-
创建表
//1.初始化表名 TableName student = TableName.valueOf("student_namespace:student"); //2.初始化列族信息 HColumnDescriptor info = new HColumnDescriptor("info"); HColumnDescriptor edu = new HColumnDescriptor("edu"); //3.绑定表名,绑定列族 HTableDescriptor hTableDescriptor = new HTableDescriptor(student); hTableDescriptor.addFamily(info); hTableDescriptor.addFamily(edu); //4.创建表 admin.createTable(hTableDescriptor);
-
判断表是否存在
//1.创建表名 TableName tableName = TableName.valueOf("student_namespace:student"); //2.判断表是否存在 boolean b = admin.tableExists(tableName); System.out.println(b);
3. 添加
操作数据使用Connection
//1.初始化要操作的表
Table table = conn.getTable(TableName.valueOf("student_namespace:student"));
//2.添加数据
Put put = new Put(Bytes.toBytes("1001"));//构造rowkey
//Bytes是HBase提供的进行字节和java数据类型转化的工具类
put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("张三"));
put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes(18));
//3.将put数据添加
table.put(put);
//4.释放资源
table.close();
4. 修改
//1.初始化要操作的表
Table table = conn.getTable(TableName.valueOf("student_namespace:student"));
//2.修改的本质就是添加,利用时间戳覆盖旧的数据而已
Put put = new Put(Bytes.toBytes("1001"));//构造row key
put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("zhangsan"));
//3.添加到表中
table.put(put);
//4.关闭table
table.close();
5. 删除
//1.获得要操作的表
Table table = conn.getTable(TableName.valueOf("student_namespace:student"));
//2.创建要删除的条件,以rowkey为条件
Delete delete = new Delete(Bytes.toBytes("1001"));
//3.执行删除
table.delete(delete);
6. 查询
-
根据
row key
单条查询//1.获得要操作的表 Table table = conn.getTable(TableName.valueOf("student_namespace:student")); //2.使用row key作为查询条件 Get get = new Get(Bytes.toBytes("1001")); //3.执行查询 Result result = table.get(get); //4.处理结果集:result.getValue() byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")); byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age")); //获得rowkey byte[] rowBytes = result.getRow(); System.out.println(Bytes.toString(nameBytes)); System.out.println(Bytes.toInt(ageBytes)); System.out.println(Bytes.toString(rowBytes)); //5.释放资源 table.close();
-
多条查询
//1.获得要操作的表 Table table = conn.getTable(TableName.valueOf("student_namespace:student")); //2.创建scan扫描器,多行查询 Scan scan = new Scan(); //3.指定要投射的列族 scan.addFamily(Bytes.toBytes("info")); //4.设置起始和查询条数 scan.withStartRow(Bytes.toBytes("1001")); scan.setLimit(10); //5.执行查询 ResultScanner result = table.getScanner(scan); //6.处理结果集 for (Result res:result){ byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")); byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age")); byte[] rowBytes = result.getRow(); String name = Bytes.toString(nameBytes); int age = Bytes.toInt(ageBytes); String rowKey = Bytes.toString(rowBytes); System.out.println(rowKey + ":" + name + ":" + age); } //7.释放资源 table.close();
读写数据操作原理
读数据
写数据