Introduction to Hbase and basic command usage

1 Overview

HBase is a highly reliable, high-performance, column-oriented, and scalable distributed storage system

Client start command

method one:

分别启动
bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver

Way two:

//启动
bin/start-hbase.sh
//关闭
bin/stop-hbase.sh

2. Hbase architecture

1Client

Client contains the interface for accessing Hbase. In addition, Client also maintains corresponding cache to accelerate Hbase access, such as the information of the .META. metadata of the cache.

2Zookeeper

HBase uses Zookeeper to do high-availability master, RegionServer monitoring, metadata entry, and cluster configuration maintenance. The specific work is as follows:

Use Zoopkeeper to ensure that only one master is running in the cluster. If the master is abnormal, a new master will be generated through the competition mechanism to provide services

Monitor the status of RegionServer through Zoopkeeper. When RegionSevrer is abnormal, notify Master RegionServer of online and offline information in the form of callback

Unified entry address for storing metadata through Zoopkeeper

3 Hmaster NameNode ) 、

The main duties of master node is as follows:
distribution RegionServer Region is
to maintain load balancing across a cluster of
maintaining metadata information clusters
found Region failure, and the failure of Region assign to a normal RegionServer
when RegionSever failure to coordinate the demolition of the corresponding Hlog Points
4 ) HregionServer(DataNode)

HregionServer directly connects users' read and write requests,

Its functions are summarized as follows:
Manage the region allocated by the master to
process read
and write requests from the client. Responsible for interaction with the underlying HDFS, storing data in HDFS,
responsible for the split after the Region becomes larger, and
responsible for the merge of Storefile.

5HDFS

HDFS provides the ultimate underlying data storage service for Hbase, and at the same time provides high-availability (Hlog stored in HDFS) support for HBase. The specific functions are summarized as follows:
Provide
multiple copies of the underlying distributed storage service for metadata and table data to ensure high Reliability and high availability

3. Basic instructions

1. Enter the client

bin/hbase shell

2. View the tables in the current namespace

list

3. Create a table

Need to declare the column family when creating a table

create 'student','info'

4. Insert data

hbase(main):004:0> put 'student','1001','info:sex','male'
hbase(main):004:0> put 'student','1001','info:age','18'
hbase(main):005:0> put 'student','1002','info:name','Janna'
hbase(main):006:0> put 'student','1002','info:sex','female'
hbase(main):007:0> put 'student','1002','info:age','20'

5. Scan to view data

hbase(main):008:0> scan 'student'
hbase(main):009:0> scan 'student',{STARTROW => '1001', STOPROW  => '1001'}
hbase(main):010:0> scan 'student',{STARTROW => '1001'}

6. View the table structure

hbase(main):011:0> describe 'student'

7. Update the data of the specified field (in fact, the data is not updated, but the data record is inserted, Hbase controls the version VERSION through the timestamp)

hbase(main):012:0> put 'student','1001','info:name','Nick'

hbase(main):013:0> put 'student','1001','info:age','100'

8. View the data of "specified row" or "specified column family: column"

hbase(main):014:0> get 'student','1001'
hbase(main):015:0> get 'student','1001','info:name'

9. Statistics table data rows

hbase(main):021:0> count 'student'

10. Delete data

Delete all data of a rowkey

hbase(main):016:0> deleteall 'student','1001'

Delete a column of data in a rowkey

hbase(main):017:0> delete 'student','1002','info:sex'

11. Clear table data (the order of operations to clear the table is first disable, then truncate)

hbase(main):018:0> truncate 'student'

12. Delete table data

首先需要先让该表为disable状态:
hbase(main):019:0> disable 'student'
然后才可以drop这个表:
hbase(main):020:0> drop 'student'

13. Change table information

Store the data in the info column family in 3 versions according to the latest timestamp:

hbase(main):022:0> alter 'student',{NAME=>'info',VERSIONS=>3}

14. Other commands

//创建命名空间
create_namespace '_'
//向指定命名空间创建表
create '命名空间:表名','列族'

4. Hbase data structure

1、RowKey

RowKey is the primary key used to retrieve records. There are three ways to access rows in HBASE table:

1. Access through a single RowKey (get)

2. Pass RowKey's range (regular) (like)

3. Full table scan (scan)

RowKey can be any character string. In HBASE, RowKey is stored as a byte array .

When storing, the data is stored in lexicographical order (byte order) of RowKey. When designing RowKey, it is necessary to fully sort the storage feature, and store the rows that are frequently read together. (Location correlation)

2、Column Family

Column family: Each column in the HBASE table belongs to a certain column family.

The column family is part of the table's schema (and the column is not) and must be defined before the table is used.

3、Cell

The unit uniquely determined by {rowkey, column Family:columu, version}. The data in the cell has no type, and is all stored in bytecode.

4、Time Stamp

Each cell stores multiple versions of the same data. Versions are indexed by timestamp.

The type of timestamp is a 64-bit integer. The timestamp can be assigned by HBASE (automatically when data is written), and the timestamp is the current system time accurate to milliseconds. The timestamp can also be explicitly assigned by the client. If the application wants to avoid data version conflicts, it must generate a unique timestamp by itself.

In each cell, different versions of data are sorted in reverse chronological order , that is, the latest data is ranked first.

HBASE provides two data version recovery methods:

One is to save the last n versions of the data

The second is to save the latest version (such as the last seven days).

5、NameSpace

1) Table : All tables are members of a namespace, that is, the table must belong to a namespace, if not specified, it will be in the default namespace.

2) RegionServer group : A namespace contains the default RegionServer Group.

3) Permission : Permission, the namespace allows us to define the access control list ACL ( Access Control List ). For example, create table, read table, delete, update and so on.

4) Quota : Quota, which can enforce the number of regions that a namespace can contain.

 

 

Guess you like

Origin blog.csdn.net/QJQJLOVE/article/details/107210293