Author of the article: foochane
Original link: https://foochane.cn/article/2019062801.html
1 Hbase basic introduction
Hbase
It is a distributed database that can provide real-time random read and write data.
Hbase
And mysql
, oralce
, db2
, sqlserver
and other different relational database, which is a NoSQL
database (non-relational databases), and has the following characteristics:
Hbase
The table model and a relational database table model different:Hbase
The table does not have a fixed field definitions;Hbase
Each row in the table stored are somekey-value
of theHbase
The table has columns divided family, the user can specify which group column which is inserted kvHbase
Table in physical storage, is divided according to the column group, the group of different columns of data must be stored in different filesHbase
Each row in the table has a row of keys is fixed, and the row of keys each row in the table can not be repeatedHbase
The data contains the line key, containkey
, containvalue
, allbyte[ ]
types,hbase
it is not responsible for the maintenance of user data typesHbase
Poor support for transactions
HBASE
Compared to other databases nosql ( mongodb
, , redis
, cassendra
) hazelcast
characteristics:
Since Hbase
the table data stored HDFS
in the file system, the storage capacity can be extended linearly; high safety and reliability of data storage!
2 Hbase table structure
rowkey: row key | base_info | extra_info |
---|---|---|
001 | name:zs,age:22,sex:male | hobbiy:read,addr:beijing |
002 | name:laowang,sex:male |
Hbase huge table model Table model with a relational database like mysql difference
hbase table model are: line concept; however there is no concept of field
Line deposit are key-value pairs, each row of key-value pairs in the key may be widely varied.
Important model table hbase
- A table with the table name
- Table can be divided into a (different data files stored in different columns Group) Group plurality of columns
- Each row in the table there is a "line key rowkey", and the OK key is not repeated in the table
- Table of each pair
key-value
is called acell
- hbase data can store multiple versions of history (the history of the number of versions can be configured), take the latest version default
- Since the large amount of data an entire table, is cut transversely into a plurality of region (identified by rowkey range), but also a different region of data stored in different files
hbase will insert data is stored sequentially:
- Sort key will first row
- Kv same line inside the family will be sorted by column, then sort k
hbase table data type:
hbase only supports byte [], byte here [] includes: rowkey, key, value, column family names, table names.
Table is divided into different region.
3 Hbase working mechanism
Hbase distributed system consists of two roles
- Administrative roles: HMaster (usually two, one active, one standby)
- Data Node Role: HRegionServer (more than one, and datanode together)
Hbase
Do not do data processing, it does not need yarn
, yarn
is a copy Mapreduce calculated, Hbase
is only responsible for data management
4 Hbase installation
4.1 Installation Preparation
First, there must be a HDFS
cluster, and running; Hbase
the regionserver
should talk hdfs
in datanode
together
secondly, you also need a zookeeper
cluster, and running, so the installation Hbase
must first install zookeeper
, zookeeper
it has already been installed before.
Then, installHbase
4.2 node arrangement
Each node roles are assigned as follows:
node | Installation services |
---|---|
Master | namenode datanode regionserver hmaster zookeeper |
Slave01 | datanode regionserver zookeeper |
Slave02 | datanode regionserver zookeeper |
4.3 Installation Hbase
Extracting hbase
installation packagehbase-2.0.5-bin.tar.gz
modifyhbase-env.sh
export JAVA_HOME=/usr/local/bigdata/java/jdk1.8.0_211
# 不启动hbase自带的zookeeper,我们自己已经装了
export HBASE_MANAGES_ZK=false
Modify hbase-site.xml
<configuration>
<!-- 指定hbase在HDFS上存储的路径 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://Master:9000/hbase</value>
</property>
<!-- 指定hbase是分布式的 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定zk的地址,多个用“,”分割 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>Master:2181,Slave01:2181,Slave02:2181</value>
</property>
</configuration>
Modify regionservers
Master
Slave01
Slave02
After editing, the installation folder into three node /usr/local/bigdata/
directory
6 Start Hbase cluster
Check hdfs
and zookeeper
whether to start normal,
Master:
hadoop@Master:~$ jps
4918 DataNode
2744 QuorumPeerMain
4748 NameNode
9949 Jps
5167 SecondaryNameNode
hadoop@Master:~$ /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
Slave01:
hadoop@Slave1:~$ jps
3235 QuorumPeerMain
3779 DataNode
5546 Jps
hadoop@Slave1:~$ /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
Slave02:
hadoop@Slave2:~$ jps
11958 DataNode
13656 Jps
11390 QuorumPeerMain
hadoop@Slave2:~$ /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
Then executestart-hbase.sh
$ bin/start-hbase.sh
The above command will start the configuration file regionserver
of all the machine to add, if you want to manually start one of which can be used:
$ bin/hbase-daemon.sh start regionserver
After startup starts on the Master HRegionServer
and HMaster
two services, Slave01
and Slave02
will start the HMaster
service.
High availability Hbase
cluster configuration should be two master
one in the active
state is in a standby
state for monitoringregionserver
We can then start again from one of the other two machines in HRegionServer
service.
$ bin/hbase-daemon.sh start master
This will start a new master is a backup
7 command line client starts Hbase
Use the commandhbase shell
bin/hbase shell
Hbase> list // 查看表
Hbase> status // 查看集群状态
Hbase> version // 查看集群版本
problem
ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2932)
at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:1084)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
solve
$ hdfs dfsadmin -safemode leave
8 Hbase command-line client operation
Table 8.1 build
create 't_user_info','base_info','extra_info'
表名 列族名 列族名
8.2 insert data:
hbase(main):011:0> put 't_user_info','001','base_info:username','zhangsan'
0 row(s) in 0.2420 seconds
hbase(main):012:0> put 't_user_info','001','base_info:age','18'
0 row(s) in 0.0140 seconds
hbase(main):013:0> put 't_user_info','001','base_info:sex','female'
0 row(s) in 0.0070 seconds
hbase(main):014:0> put 't_user_info','001','extra_info:career','it'
0 row(s) in 0.0090 seconds
hbase(main):015:0> put 't_user_info','002','extra_info:career','actoress'
0 row(s) in 0.0090 seconds
hbase(main):016:0> put 't_user_info','002','base_info:username','liuyifei'
0 row(s) in 0.0060 seconds
8.3 a way to query the data: scan scan
hbase(main):017:0> scan 't_user_info'
ROW COLUMN+CELL
001 column=base_info:age, timestamp=1496567924507, value=18
001 column=base_info:sex, timestamp=1496567934669, value=female
001 column=base_info:username, timestamp=1496567889554, value=zhangsan
001 column=extra_info:career, timestamp=1496567963992, value=it
002 column=base_info:username, timestamp=1496568034187, value=liuyifei
002 column=extra_info:career, timestamp=1496568008631, value=actoress
2 row(s) in 0.0420 seconds
8.4 query data Second way: get a single row
hbase(main):020:0> get 't_user_info','001'
COLUMN CELL
base_info:age timestamp=1496568160192, value=19
base_info:sex timestamp=1496567934669, value=female
base_info:username timestamp=1496567889554, value=zhangsan
extra_info:career timestamp=1496567963992, value=it
4 row(s) in 0.0770 seconds
To delete a 8.5 kv data
hbase(main):021:0> delete 't_user_info','001','base_info:sex'
0 row(s) in 0.0390 seconds
删除整行数据:
hbase(main):024:0> deleteall 't_user_info','001'
0 row(s) in 0.0090 seconds
hbase(main):025:0> get 't_user_info','001'
COLUMN CELL
0 row(s) in 0.0110 seconds
3.4.1.6. 删除整个表:
hbase(main):028:0> disable 't_user_info'
0 row(s) in 2.3640 seconds
hbase(main):029:0> drop 't_user_info'
0 row(s) in 1.2950 seconds
hbase(main):030:0> list
TABLE
0 row(s) in 0.0130 seconds
=> []
8.6 Hbase important feature - the sort characteristic (row keys)
Inserted hbase
in to data, hbase
automatically sorting storage:
Collation: first look row key, and then look at the column group name, then look at the column ( key
) name; lexicographically
Hbase of this feature with the query efficiency have a great relationship
For example: a table used to store user information, with the name, residence, age, occupation and other information ....
Then, often need business systems:
query a province of all users
often need to specify a query Province All users surname
Idea: if the user can save the same hbase
storage file stored contiguously, and is capable of the same province of the same name user continuously stored, then the efficiency of the two query requirement will increase! ! !
Practices: the fight to the query rowkey
within
9 HBASE client API operation
9.1 DDL operations
Code flow:
- Create a connection:
Connection conn = ConnectionFactory.createConnection(conf);
- Operator to get a DDL: Table Manager:
adminAdmin admin = conn.getAdmin();
- With api table manager to build tables, delete tables, modify a table definition:
admin.createTable(HTableDescriptor descriptor);
@Before
public void getConn() throws Exception{
// 构建一个连接对象
Configuration conf = HBaseConfiguration.create(); // 会自动加载hbase-site.xml
conf.set("hbase.zookeeper.quorum", "192.168.233.200:2181,192.168.233.201:2181,192.168.233.202:2181");
conn = ConnectionFactory.createConnection(conf);
}
/**
* DDL
* @throws Exception
*/
@Test
public void testCreateTable() throws Exception{
// 从连接中构造一个DDL操作器
Admin admin = conn.getAdmin();
// 创建一个表定义描述对象
HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf("user_info"));
// 创建列族定义描述对象
HColumnDescriptor hColumnDescriptor_1 = new HColumnDescriptor("base_info");
hColumnDescriptor_1.setMaxVersions(3); // 设置该列族中存储数据的最大版本数,默认是1
HColumnDescriptor hColumnDescriptor_2 = new HColumnDescriptor("extra_info");
// 将列族定义信息对象放入表定义对象中
hTableDescriptor.addFamily(hColumnDescriptor_1);
hTableDescriptor.addFamily(hColumnDescriptor_2);
// 用ddl操作器对象:admin 来建表
admin.createTable(hTableDescriptor);
// 关闭连接
admin.close();
conn.close();
}
/**
* 删除表
* @throws Exception
*/
@Test
public void testDropTable() throws Exception{
Admin admin = conn.getAdmin();
// 停用表
admin.disableTable(TableName.valueOf("user_info"));
// 删除表
admin.deleteTable(TableName.valueOf("user_info"));
admin.close();
conn.close();
}
// 修改表定义--添加一个列族
@Test
public void testAlterTable() throws Exception{
Admin admin = conn.getAdmin();
// 取出旧的表定义信息
HTableDescriptor tableDescriptor = admin.getTableDescriptor(TableName.valueOf("user_info"));
// 新构造一个列族定义
HColumnDescriptor hColumnDescriptor = new HColumnDescriptor("other_info");
hColumnDescriptor.setBloomFilterType(BloomType.ROWCOL); // 设置该列族的布隆过滤器类型
// 将列族定义添加到表定义对象中
tableDescriptor.addFamily(hColumnDescriptor);
// 将修改过的表定义交给admin去提交
admin.modifyTable(TableName.valueOf("user_info"), tableDescriptor);
admin.close();
conn.close();
}
9.2 DML operations
HBase
CRUD
Connection conn = null;
@Before
public void getConn() throws Exception{
// 构建一个连接对象
Configuration conf = HBaseConfiguration.create(); // 会自动加载hbase-site.xml
conf.set("hbase.zookeeper.quorum", "Master:2181,Slave01:2181,Slave02:2181");
conn = ConnectionFactory.createConnection(conf);
}
/**
* 增
* 改:put来覆盖
* @throws Exception
*/
@Test
public void testPut() throws Exception{
// 获取一个操作指定表的table对象,进行DML操作
Table table = conn.getTable(TableName.valueOf("user_info"));
// 构造要插入的数据为一个Put类型(一个put对象只能对应一个rowkey)的对象
Put put = new Put(Bytes.toBytes("001"));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("张三"));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("18"));
put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("北京"));
Put put2 = new Put(Bytes.toBytes("002"));
put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("李四"));
put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("28"));
put2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("上海"));
ArrayList<Put> puts = new ArrayList<>();
puts.add(put);
puts.add(put2);
// 插进去
table.put(puts);
table.close();
conn.close();
}
/**
* 循环插入大量数据
* @throws Exception
*/
@Test
public void testManyPuts() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
ArrayList<Put> puts = new ArrayList<>();
for(int i=0;i<100000;i++){
Put put = new Put(Bytes.toBytes(""+i));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("张三"+i));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes((18+i)+""));
put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("北京"));
puts.add(put);
}
table.put(puts);
}
/**
* 删
* @throws Exception
*/
@Test
public void testDelete() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
// 构造一个对象封装要删除的数据信息
Delete delete1 = new Delete(Bytes.toBytes("001"));
Delete delete2 = new Delete(Bytes.toBytes("002"));
delete2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"));
ArrayList<Delete> dels = new ArrayList<>();
dels.add(delete1);
dels.add(delete2);
table.delete(dels);
table.close();
conn.close();
}
/**
* 查
* @throws Exception
*/
@Test
public void testGet() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
Get get = new Get("002".getBytes());
Result result = table.get(get);
// 从结果中取用户指定的某个key的value
byte[] value = result.getValue("base_info".getBytes(), "age".getBytes());
System.out.println(new String(value));
System.out.println("-------------------------");
// 遍历整行结果中的所有kv单元格
CellScanner cellScanner = result.cellScanner();
while(cellScanner.advance()){
Cell cell = cellScanner.current();
byte[] rowArray = cell.getRowArray(); //本kv所属的行键的字节数组
byte[] familyArray = cell.getFamilyArray(); //列族名的字节数组
byte[] qualifierArray = cell.getQualifierArray(); //列名的字节数据
byte[] valueArray = cell.getValueArray(); // value的字节数组
System.out.println("行键: "+new String(rowArray,cell.getRowOffset(),cell.getRowLength()));
System.out.println("列族名: "+new String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength()));
System.out.println("列名: "+new String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength()));
System.out.println("value: "+new String(valueArray,cell.getValueOffset(),cell.getValueLength()));
}
table.close();
conn.close();
}
/**
* 按行键范围查询数据
* @throws Exception
*/
@Test
public void testScan() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
// 包含起始行键,不包含结束行键,但是如果真的想查询出末尾的那个行键,那么,可以在末尾行键上拼接一个不可见的字节(\000)
Scan scan = new Scan("10".getBytes(), "10000\001".getBytes());
ResultScanner scanner = table.getScanner(scan);
Iterator<Result> iterator = scanner.iterator();
while(iterator.hasNext()){
Result result = iterator.next();
// 遍历整行结果中的所有kv单元格
CellScanner cellScanner = result.cellScanner();
while(cellScanner.advance()){
Cell cell = cellScanner.current();
byte[] rowArray = cell.getRowArray(); //本kv所属的行键的字节数组
byte[] familyArray = cell.getFamilyArray(); //列族名的字节数组
byte[] qualifierArray = cell.getQualifierArray(); //列名的字节数据
byte[] valueArray = cell.getValueArray(); // value的字节数组
System.out.println("行键: "+new String(rowArray,cell.getRowOffset(),cell.getRowLength()));
System.out.println("列族名: "+new String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength()));
System.out.println("列名: "+new String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength()));
System.out.println("value: "+new String(valueArray,cell.getValueOffset(),cell.getValueLength()));
}
System.out.println("----------------------");
}
}
@Test
public void test(){
String a = "000";
String b = "000\0";
System.out.println(a);
System.out.println(b);
byte[] bytes = a.getBytes();
byte[] bytes2 = b.getBytes();
System.out.println("");
}