HBase installation and use

Author of the article: foochane

Original link: https://foochane.cn/article/2019062801.html

1 Hbase basic introduction

HbaseIt is a distributed database that can provide real-time random read and write data.

HbaseAnd mysql, oralce, db2, sqlserverand other different relational database, which is a NoSQLdatabase (non-relational databases), and has the following characteristics:

HbaseThe table model and a relational database table model different:
HbaseThe table does not have a fixed field definitions;
HbaseEach row in the table stored are some key-valueof the
HbaseThe table has columns divided family, the user can specify which group column which is inserted kv
HbaseTable in physical storage, is divided according to the column group, the group of different columns of data must be stored in different files
HbaseEach row in the table has a row of keys is fixed, and the row of keys each row in the table can not be repeated
HbaseThe data contains the line key, contain key, contain value, all byte[ ]types, hbaseit is not responsible for the maintenance of user data types
HbasePoor support for transactions

HBASECompared to other databases nosql ( mongodb, , redis, cassendra) hazelcastcharacteristics:
Since Hbasethe table data stored HDFSin the file system, the storage capacity can be extended linearly; high safety and reliability of data storage!

2 Hbase table structure

rowkey: row key	base_info	extra_info
001	name:zs,age:22,sex:male	hobbiy:read,addr:beijing
002	name:laowang,sex:male

Hbase huge table model Table model with a relational database like mysql difference

hbase table model are: line concept; however there is no concept of field

Line deposit are key-value pairs, each row of key-value pairs in the key may be widely varied.

Important model table hbase

A table with the table name
Table can be divided into a (different data files stored in different columns Group) Group plurality of columns
Each row in the table there is a "line key rowkey", and the OK key is not repeated in the table
Table of each pair key-valueis called acell
hbase data can store multiple versions of history (the history of the number of versions can be configured), take the latest version default
Since the large amount of data an entire table, is cut transversely into a plurality of region (identified by rowkey range), but also a different region of data stored in different files

hbase will insert data is stored sequentially:

Sort key will first row
Kv same line inside the family will be sorted by column, then sort k

hbase table data type:

hbase only supports byte [], byte here [] includes: rowkey, key, value, column family names, table names.
Table is divided into different region.

3 Hbase working mechanism

Hbase overall schematic diagram of the working mechanism

Hbase distributed system consists of two roles

Administrative roles: HMaster (usually two, one active, one standby)
Data Node Role: HRegionServer (more than one, and datanode together)

HbaseDo not do data processing, it does not need yarn, yarnis a copy Mapreduce calculated, Hbaseis only responsible for data management

4 Hbase installation

4.1 Installation Preparation

First, there must be a HDFScluster, and running; Hbasethe regionservershould talk hdfsin datanodetogether
secondly, you also need a zookeepercluster, and running, so the installation Hbasemust first install zookeeper, zookeeperit has already been installed before.
Then, installHbase

4.2 node arrangement

Each node roles are assigned as follows:

node	Installation services
Master	namenode datanode regionserver hmaster zookeeper
Slave01	datanode regionserver zookeeper
Slave02	datanode regionserver zookeeper

4.3 Installation Hbase

Extracting hbaseinstallation packagehbase-2.0.5-bin.tar.gz

modifyhbase-env.sh

export JAVA_HOME=/usr/local/bigdata/java/jdk1.8.0_211

# 不启动hbase自带的zookeeper,我们自己已经装了
export HBASE_MANAGES_ZK=false

Modify hbase-site.xml

<configuration>
    <!-- 指定hbase在HDFS上存储的路径 -->
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://Master:9000/hbase</value>
    </property>
    <!-- 指定hbase是分布式的 -->
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <!-- 指定zk的地址，多个用“,”分割 -->
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>Master:2181,Slave01:2181,Slave02:2181</value>
    </property>
</configuration>

Modify regionservers

Master
Slave01
Slave02

After editing, the installation folder into three node /usr/local/bigdata/directory

6 Start Hbase cluster

Check hdfsand zookeeperwhether to start normal,
Master:

hadoop@Master:~$ jps
4918 DataNode
2744 QuorumPeerMain
4748 NameNode
9949 Jps
5167 SecondaryNameNode
hadoop@Master:~$ /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower

Slave01:

hadoop@Slave1:~$ jps
3235 QuorumPeerMain
3779 DataNode
5546 Jps
hadoop@Slave1:~$  /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader

Slave02:

hadoop@Slave2:~$ jps
11958 DataNode
13656 Jps
11390 QuorumPeerMain
hadoop@Slave2:~$  /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower

Then executestart-hbase.sh

$ bin/start-hbase.sh

The above command will start the configuration file regionserverof all the machine to add, if you want to manually start one of which can be used:

$ bin/hbase-daemon.sh start regionserver

After startup starts on the Master HRegionServerand HMastertwo services, Slave01and Slave02will start the HMasterservice.

High availability Hbasecluster configuration should be two masterone in the activestate is in a standbystate for monitoringregionserver

We can then start again from one of the other two machines in HRegionServerservice.

$ bin/hbase-daemon.sh start master

This will start a new master is a backup

7 command line client starts Hbase

Use the commandhbase shell

bin/hbase shell
Hbase> list     // 查看表
Hbase> status   // 查看集群状态
Hbase> version  // 查看集群版本

problem

ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
        at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2932)
        at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:1084)
        at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)

solve

$ hdfs dfsadmin -safemode leave

8 Hbase command-line client operation

Table 8.1 build

create 't_user_info','base_info','extra_info'
         表名      列族名   列族名

8.2 insert data:

hbase(main):011:0> put 't_user_info','001','base_info:username','zhangsan'
0 row(s) in 0.2420 seconds

hbase(main):012:0> put 't_user_info','001','base_info:age','18'
0 row(s) in 0.0140 seconds

hbase(main):013:0> put 't_user_info','001','base_info:sex','female'
0 row(s) in 0.0070 seconds

hbase(main):014:0> put 't_user_info','001','extra_info:career','it'
0 row(s) in 0.0090 seconds

hbase(main):015:0> put 't_user_info','002','extra_info:career','actoress'
0 row(s) in 0.0090 seconds

hbase(main):016:0> put 't_user_info','002','base_info:username','liuyifei'
0 row(s) in 0.0060 seconds

8.3 a way to query the data: scan scan

hbase(main):017:0> scan 't_user_info'
ROW                               COLUMN+CELL                                                                                     
 001                              column=base_info:age, timestamp=1496567924507, value=18                                         
 001                              column=base_info:sex, timestamp=1496567934669, value=female                                     
 001                              column=base_info:username, timestamp=1496567889554, value=zhangsan                              
 001                              column=extra_info:career, timestamp=1496567963992, value=it                                     
 002                              column=base_info:username, timestamp=1496568034187, value=liuyifei                              
 002                              column=extra_info:career, timestamp=1496568008631, value=actoress                               
2 row(s) in 0.0420 seconds

8.4 query data Second way: get a single row

hbase(main):020:0> get 't_user_info','001'
COLUMN                            CELL                                                                                            
 base_info:age                    timestamp=1496568160192, value=19                                                               
 base_info:sex                    timestamp=1496567934669, value=female                                                           
 base_info:username               timestamp=1496567889554, value=zhangsan                                                         
 extra_info:career                timestamp=1496567963992, value=it                                                               
4 row(s) in 0.0770 seconds

To delete a 8.5 kv data

hbase(main):021:0> delete 't_user_info','001','base_info:sex'
0 row(s) in 0.0390 seconds

删除整行数据：
hbase(main):024:0> deleteall 't_user_info','001'
0 row(s) in 0.0090 seconds

hbase(main):025:0> get 't_user_info','001'
COLUMN                            CELL                                                                                            
0 row(s) in 0.0110 seconds

3.4.1.6.    删除整个表：
hbase(main):028:0> disable 't_user_info'
0 row(s) in 2.3640 seconds

hbase(main):029:0> drop 't_user_info'
0 row(s) in 1.2950 seconds

hbase(main):030:0> list
TABLE                                                                                                                             
0 row(s) in 0.0130 seconds

=> []

8.6 Hbase important feature - the sort characteristic (row keys)

Inserted hbasein to data, hbaseautomatically sorting storage:
Collation: first look row key, and then look at the column group name, then look at the column ( key) name; lexicographically

Hbase of this feature with the query efficiency have a great relationship

For example: a table used to store user information, with the name, residence, age, occupation and other information ....
Then, often need business systems:
query a province of all users
often need to specify a query Province All users surname

Idea: if the user can save the same hbasestorage file stored contiguously, and is capable of the same province of the same name user continuously stored, then the efficiency of the two query requirement will increase! ! !

Practices: the fight to the query rowkeywithin

9 HBASE client API operation

9.1 DDL operations

Code flow:

Create a connection:Connection conn = ConnectionFactory.createConnection(conf);
Operator to get a DDL: Table Manager:adminAdmin admin = conn.getAdmin();
With api table manager to build tables, delete tables, modify a table definition:admin.createTable(HTableDescriptor descriptor);

@Before
public void getConn() throws Exception{
    // 构建一个连接对象
    Configuration conf = HBaseConfiguration.create(); // 会自动加载hbase-site.xml
    conf.set("hbase.zookeeper.quorum", "192.168.233.200:2181,192.168.233.201:2181,192.168.233.202:2181");
    
    conn = ConnectionFactory.createConnection(conf);
}


/**
 * DDL
 * @throws Exception 
 */
@Test
public void testCreateTable() throws Exception{

    // 从连接中构造一个DDL操作器
    Admin admin = conn.getAdmin();
    
    // 创建一个表定义描述对象
    HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf("user_info"));
    
    // 创建列族定义描述对象
    HColumnDescriptor hColumnDescriptor_1 = new HColumnDescriptor("base_info");
    hColumnDescriptor_1.setMaxVersions(3); // 设置该列族中存储数据的最大版本数,默认是1
    
    HColumnDescriptor hColumnDescriptor_2 = new HColumnDescriptor("extra_info");
    
    // 将列族定义信息对象放入表定义对象中
    hTableDescriptor.addFamily(hColumnDescriptor_1);
    hTableDescriptor.addFamily(hColumnDescriptor_2);
    
    
    // 用ddl操作器对象：admin 来建表
    admin.createTable(hTableDescriptor);
    
    // 关闭连接
    admin.close();
    conn.close();
    
}


/**
 * 删除表
 * @throws Exception 
 */
@Test
public void testDropTable() throws Exception{
    
    Admin admin = conn.getAdmin();
    
    // 停用表
    admin.disableTable(TableName.valueOf("user_info"));
    // 删除表
    admin.deleteTable(TableName.valueOf("user_info"));
    
    
    admin.close();
    conn.close();
}

// 修改表定义--添加一个列族
@Test
public void testAlterTable() throws Exception{
    
    Admin admin = conn.getAdmin();
    
    // 取出旧的表定义信息
    HTableDescriptor tableDescriptor = admin.getTableDescriptor(TableName.valueOf("user_info"));
    
    
    // 新构造一个列族定义
    HColumnDescriptor hColumnDescriptor = new HColumnDescriptor("other_info");
    hColumnDescriptor.setBloomFilterType(BloomType.ROWCOL); // 设置该列族的布隆过滤器类型
    
    // 将列族定义添加到表定义对象中
    tableDescriptor.addFamily(hColumnDescriptor);
    
    
    // 将修改过的表定义交给admin去提交
    admin.modifyTable(TableName.valueOf("user_info"), tableDescriptor);
    
    
    admin.close();
    conn.close();
    
}

9.2 DML operations

HBaseCRUD

    Connection conn = null;
    
    @Before
    public void getConn() throws Exception{
        // 构建一个连接对象
        Configuration conf = HBaseConfiguration.create(); // 会自动加载hbase-site.xml
        conf.set("hbase.zookeeper.quorum", "Master:2181,Slave01:2181,Slave02:2181");
        
        conn = ConnectionFactory.createConnection(conf);
    }
    
    
    /**
     * 增
     * 改:put来覆盖
     * @throws Exception 
     */
    @Test
    public void testPut() throws Exception{
        
        // 获取一个操作指定表的table对象,进行DML操作
        Table table = conn.getTable(TableName.valueOf("user_info"));
        
        // 构造要插入的数据为一个Put类型(一个put对象只能对应一个rowkey)的对象
        Put put = new Put(Bytes.toBytes("001"));
        put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("张三"));
        put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("18"));
        put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("北京"));
        
        
        Put put2 = new Put(Bytes.toBytes("002"));
        put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("李四"));
        put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("28"));
        put2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("上海"));
    
        
        ArrayList<Put> puts = new ArrayList<>();
        puts.add(put);
        puts.add(put2);
        
        
        // 插进去
        table.put(puts);
        
        table.close();
        conn.close();
        
    }
    
    
    /**
     * 循环插入大量数据
     * @throws Exception 
     */
    @Test
    public void testManyPuts() throws Exception{
        
        Table table = conn.getTable(TableName.valueOf("user_info"));
        ArrayList<Put> puts = new ArrayList<>();
        
        for(int i=0;i<100000;i++){
            Put put = new Put(Bytes.toBytes(""+i));
            put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("张三"+i));
            put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes((18+i)+""));
            put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("北京"));
            
            puts.add(put);
        }
        
        table.put(puts);
        
    }
    
    /**
     * 删
     * @throws Exception 
     */
    @Test
    public void testDelete() throws Exception{
        Table table = conn.getTable(TableName.valueOf("user_info"));
        
        // 构造一个对象封装要删除的数据信息
        Delete delete1 = new Delete(Bytes.toBytes("001"));
        
        Delete delete2 = new Delete(Bytes.toBytes("002"));
        delete2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"));
        
        ArrayList<Delete> dels = new ArrayList<>();
        dels.add(delete1);
        dels.add(delete2);
        
        table.delete(dels);
        
        
        table.close();
        conn.close();
    }
    
    /**
     * 查
     * @throws Exception 
     */
    @Test
    public void testGet() throws Exception{
        
        Table table = conn.getTable(TableName.valueOf("user_info"));
        
        Get get = new Get("002".getBytes());
        
        Result result = table.get(get);
        
        // 从结果中取用户指定的某个key的value
        byte[] value = result.getValue("base_info".getBytes(), "age".getBytes());
        System.out.println(new String(value));
        
        System.out.println("-------------------------");
        
        // 遍历整行结果中的所有kv单元格
        CellScanner cellScanner = result.cellScanner();
        while(cellScanner.advance()){
            Cell cell = cellScanner.current();
            
            byte[] rowArray = cell.getRowArray();  //本kv所属的行键的字节数组
            byte[] familyArray = cell.getFamilyArray();  //列族名的字节数组
            byte[] qualifierArray = cell.getQualifierArray();  //列名的字节数据
            byte[] valueArray = cell.getValueArray(); // value的字节数组
            
            System.out.println("行键: "+new String(rowArray,cell.getRowOffset(),cell.getRowLength()));
            System.out.println("列族名: "+new String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength()));
            System.out.println("列名: "+new String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength()));
            System.out.println("value: "+new String(valueArray,cell.getValueOffset(),cell.getValueLength()));
            
        }
        
        table.close();
        conn.close();
        
    }
    
    
    /**
     * 按行键范围查询数据
     * @throws Exception 
     */
    @Test
    public void testScan() throws Exception{
        
        Table table = conn.getTable(TableName.valueOf("user_info"));
        
        // 包含起始行键，不包含结束行键,但是如果真的想查询出末尾的那个行键，那么，可以在末尾行键上拼接一个不可见的字节（\000）
        Scan scan = new Scan("10".getBytes(), "10000\001".getBytes());
        
        ResultScanner scanner = table.getScanner(scan);
        
        Iterator<Result> iterator = scanner.iterator();
        
        while(iterator.hasNext()){
            
            Result result = iterator.next();
            // 遍历整行结果中的所有kv单元格
            CellScanner cellScanner = result.cellScanner();
            while(cellScanner.advance()){
                Cell cell = cellScanner.current();
                
                byte[] rowArray = cell.getRowArray();  //本kv所属的行键的字节数组
                byte[] familyArray = cell.getFamilyArray();  //列族名的字节数组
                byte[] qualifierArray = cell.getQualifierArray();  //列名的字节数据
                byte[] valueArray = cell.getValueArray(); // value的字节数组
                
                System.out.println("行键: "+new String(rowArray,cell.getRowOffset(),cell.getRowLength()));
                System.out.println("列族名: "+new String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength()));
                System.out.println("列名: "+new String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength()));
                System.out.println("value: "+new String(valueArray,cell.getValueOffset(),cell.getValueLength()));
            }
            System.out.println("----------------------");
        }
    }
    
    @Test
    public void test(){
        String a = "000";
        String b = "000\0";
        
        System.out.println(a);
        System.out.println(b);
        
        
        byte[] bytes = a.getBytes();
        byte[] bytes2 = b.getBytes();
        
        System.out.println("");
        
    }