1 Overview
HBase is a highly reliable, high-performance, column-oriented, and scalable distributed storage system
Client start command
method one:
分别启动
bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver
Way two:
//启动
bin/start-hbase.sh
//关闭
bin/stop-hbase.sh
2. Hbase architecture
1)Client
Client contains the interface for accessing Hbase. In addition, Client also maintains corresponding cache to accelerate Hbase access, such as the information of the .META. metadata of the cache.
2)Zookeeper
HBase uses Zookeeper to do high-availability master, RegionServer monitoring, metadata entry, and cluster configuration maintenance. The specific work is as follows:
Use Zoopkeeper to ensure that only one master is running in the cluster. If the master is abnormal, a new master will be generated through the competition mechanism to provide services
Monitor the status of RegionServer through Zoopkeeper. When RegionSevrer is abnormal, notify Master RegionServer of online and offline information in the form of callback
Unified entry address for storing metadata through Zoopkeeper
3 )Hmaster (NameNode ) 、
The main duties of master node is as follows:
distribution RegionServer Region is
to maintain load balancing across a cluster of
maintaining metadata information clusters
found Region failure, and the failure of Region assign to a normal RegionServer
when RegionSever failure to coordinate the demolition of the corresponding Hlog Points
4 ) HregionServer(DataNode)
HregionServer directly connects users' read and write requests,
Its functions are summarized as follows:
Manage the region allocated by the master to
process read
and write requests from the client. Responsible for interaction with the underlying HDFS, storing data in HDFS,
responsible for the split after the Region becomes larger, and
responsible for the merge of Storefile.
5)HDFS
HDFS provides the ultimate underlying data storage service for Hbase, and at the same time provides high-availability (Hlog stored in HDFS) support for HBase. The specific functions are summarized as follows:
Provide
multiple copies of the underlying distributed storage service for metadata and table data to ensure high Reliability and high availability
3. Basic instructions
1. Enter the client
bin/hbase shell
2. View the tables in the current namespace
list
3. Create a table
Need to declare the column family when creating a table
create 'student','info'
4. Insert data
hbase(main):004:0> put 'student','1001','info:sex','male'
hbase(main):004:0> put 'student','1001','info:age','18'
hbase(main):005:0> put 'student','1002','info:name','Janna'
hbase(main):006:0> put 'student','1002','info:sex','female'
hbase(main):007:0> put 'student','1002','info:age','20'
5. Scan to view data
hbase(main):008:0> scan 'student'
hbase(main):009:0> scan 'student',{STARTROW => '1001', STOPROW => '1001'}
hbase(main):010:0> scan 'student',{STARTROW => '1001'}
6. View the table structure
hbase(main):011:0> describe 'student'
7. Update the data of the specified field (in fact, the data is not updated, but the data record is inserted, Hbase controls the version VERSION through the timestamp)
hbase(main):012:0> put 'student','1001','info:name','Nick'
hbase(main):013:0> put 'student','1001','info:age','100'
8. View the data of "specified row" or "specified column family: column"
hbase(main):014:0> get 'student','1001'
hbase(main):015:0> get 'student','1001','info:name'
9. Statistics table data rows
hbase(main):021:0> count 'student'
10. Delete data
Delete all data of a rowkey
hbase(main):016:0> deleteall 'student','1001'
Delete a column of data in a rowkey
hbase(main):017:0> delete 'student','1002','info:sex'
11. Clear table data (the order of operations to clear the table is first disable, then truncate)
hbase(main):018:0> truncate 'student'
12. Delete table data
首先需要先让该表为disable状态:
hbase(main):019:0> disable 'student'
然后才可以drop这个表:
hbase(main):020:0> drop 'student'
13. Change table information
Store the data in the info column family in 3 versions according to the latest timestamp:
hbase(main):022:0> alter 'student',{NAME=>'info',VERSIONS=>3}
14. Other commands
//创建命名空间
create_namespace '_'
//向指定命名空间创建表
create '命名空间:表名','列族'
4. Hbase data structure
1、RowKey
RowKey is the primary key used to retrieve records. There are three ways to access rows in HBASE table:
1. Access through a single RowKey (get)
2. Pass RowKey's range (regular) (like)
3. Full table scan (scan)
RowKey can be any character string. In HBASE, RowKey is stored as a byte array .
When storing, the data is stored in lexicographical order (byte order) of RowKey. When designing RowKey, it is necessary to fully sort the storage feature, and store the rows that are frequently read together. (Location correlation)
2、Column Family
Column family: Each column in the HBASE table belongs to a certain column family.
The column family is part of the table's schema (and the column is not) and must be defined before the table is used.
3、Cell
The unit uniquely determined by {rowkey, column Family:columu, version}. The data in the cell has no type, and is all stored in bytecode.
4、Time Stamp
Each cell stores multiple versions of the same data. Versions are indexed by timestamp.
The type of timestamp is a 64-bit integer. The timestamp can be assigned by HBASE (automatically when data is written), and the timestamp is the current system time accurate to milliseconds. The timestamp can also be explicitly assigned by the client. If the application wants to avoid data version conflicts, it must generate a unique timestamp by itself.
In each cell, different versions of data are sorted in reverse chronological order , that is, the latest data is ranked first.
HBASE provides two data version recovery methods:
One is to save the last n versions of the data
The second is to save the latest version (such as the last seven days).
5、NameSpace
1) Table : All tables are members of a namespace, that is, the table must belong to a namespace, if not specified, it will be in the default namespace.
2) RegionServer group : A namespace contains the default RegionServer Group.
3) Permission : Permission, the namespace allows us to define the access control list ACL ( Access Control List ). For example, create table, read table, delete, update and so on.
4) Quota : Quota, which can enforce the number of regions that a namespace can contain.