hbase（一）

为什么有hbase?

随着数据量越来越大，传统的关系型数据库不能满足存储需求，hive虽然能满足存储，但是不能满足非结构化或者半结构化的数据存储和高效查询。

HBASE是什么？

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables – billions of rows X millions of columns – atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
HBASE是一个开源的、分布式的、多版本的（数据可以保留多个版本）、可扩展的非关系型数据库。
HBASE是bigtable的开源java版本。是建立在hdfs之上，提供高可靠性、高性能、列式存储、可伸缩、实时读写的nosql数据库。

RDBMS:mysql,sqlserver,oracle,db2,access,excel等
NoSQL:HBASE、MongoDB、Redis、memcache等

适用场景：
需要处理海量的非结构化的数据进行存储，需要随机的近实时的读写数据

HBASE和hadoop的关系
HBASE是基于hadoop、存储依赖于hdfs

hbase的架构

client,zookeeper,hmaster
hregionserver,hlog,hregion,store,memstore,sorefile,hfile

client:
hbase的客户端，包含访问HBASE的接口（linux shell 、java api）
client维护着一些cache来加快对HBASE的访问，比如region的位置信息

zookeeper
监控master的状态，保证有且仅有一个active的master，达到高可用
存储所有的region的寻址入口—root表在哪台服务器上
实时监控hregionserver的状态，将regionserver的上下线信息实时的通知给master
存储HBASE的所有表信息（HBASE的schma）,包括表名、列簇（column family）

hmaster(hbase的老大)
为regionserver分配region（新建HBASE表等）
负责regionserver的负载均衡
负责hregion的重新分配（regionserver异常、hregion变大时的一分为二）
hdfs上的垃圾文件回收
处理schema的更新请求

hregionserver：（HBASE的小弟）
regionserver维护master分配给它的region（管理region）
处理client对region的Io请求，并和hdfs进行交互
regionserver负责切分在运行过程中变大的region

扫描二维码关注公众号，回复： 3731056 查看本文章

hlog:
对HBASE的操作进行记录，使用wal写数据，优先写入hlog里面，然后写到memstore中，以防止数据丢失是可以进行回滚。

hregion：
HBASE中分布式存储和负载均衡的最小单元，表或者表的一小部分

store：
相当于一个列簇

memstore:
内存缓冲区，用于进行批量刷新数据到hdfs上

hstorefile:
hbase中的数据以hfile的形式存储到hdfs中

各组件之间的数量关系：

hmaster:hregionserver=1:n
hregionserver:hlog=1:1
hregionserver:hregion=1:n
hregion:store=1:n
store:memstore=1:1
store:storefile=1:n
storefile:hfile=1:1

HBASE的特点:###

模式：无模式
数据类型：单一，只支持byte[]
多版本：每个值可以保存多个版本
列式存储：每个列簇的数据存储到一个文件里
稀疏存储：如果key-value为null时，整个的数据不会占用存储空间

HBASE的关键字
rowkey:行键（相当于mysql 的主键，不允许重复、有顺序）
column family:列簇（列的集合）
column:列
timestamp:时间戳（显示当前时间）
version:版本号
cell:单元格

排序
1、在rowkey上有序，按照字典顺序正序排列
2、在列簇上有序，按照字典顺序进行排列
3、在列上有序，按照字典顺序进行排列

HBASE的安装

1、Standalone hbase
（1）解压并配置环境变量

tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local

vi /etc/profile

export HBASE_HOME=/usr/local/hbase-1.2.1  
export PATH=$PATH:$HBASE_HOME/bin:  

source /etc/profile

（2）配置hbase的参数

cd  conf
vi hbase-env.sh
JAVA_HOME=/usr/local/jdk1.8.0_181


vi hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbasedata</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>

测试
hbase version

启动
bin/start-hbase.sh

连接客户端
hbase shell

2、Pseudo-Distributed（略）
配置文件中设置：

<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

可参考官网：http://hbase.apache.org/book.html#quickstart

3、Advanced - Fully Distributed

（1）解压并配置环境变量

tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local
rm -rf docs
vi /etc/profile

export HBASE_HOME=/usr/local/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin:

source /etc/profile

（2）配置hbase的参数

cd ./conf
vi hbase-env.sh
exportJAVA_HOME=/usr/local/jdk1.8.0_181
export HBASE_MANAGES_ZK=false

注意：这里jdk如果为JDK8+，下面两句注释掉

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

vi conf/regionservers //设置regionserver机器，hadoop01，hadoop02,hadoop03
vi backup-masters //备份master机器,hadoop02,hadoop03
vi hbase-site.xml

<!--配置hbase在hdfs上的根目录-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:9000/hbase</value>
</property>

<!--开启hbase的分布式集群开关-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
	
<!--配置zk集群的地址-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
    
<!--配置zk集群的数据存储位置-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>	
	
<!-- 指定hbase的监控页面端口 -->
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>

（3）注意：如果hdfs是高可用集群，则需要将hdfs-site.xml和core-site.xml两个文件copy到hbase的conf目录下，不是则忽略

[root@hadoop01 hadoop]# cp hdfs-site.xml core-site.xml $HBASE_HOME/conf

（4）分发hbase到其他两台机器

scp -r hbase-1.2.1 root@hadoop02:$PWD
scp -r hbase-1.2.1 root@hadoop03:$PWD

（5）启动hbase集群
hbase依赖于zookeeper、hdfs，先启动zk，再启动hdfs，最后启动hbase

zkServer.sh start
zkServer.sh status
start-dfs.sh
start-hbase.sh

查看进程：

[root@hadoop01 conf]# jps
56903 HRegionServer
55960 QuorumPeerMain
62328 Jps
56760 HMaster
56186 NameNode
56333 DataNode
59309 Main

web监控端口：
60010

hmaster:16010
hregionserver:16030
内部通讯端口：16020

注意：
时间同步

HBASE的shell命令

连接客户端
hbase shell

可以通过help学习hbase shell的使用：

help
help 'COMMAND'
help 'COMMAND_GROUP'

hbase(main):004:0> help
HBase Shell, version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quotas, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: abort_procedure, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html

查看：

list
list_namespace

namespace：命名空间、名称空间或者组的概念，相当于库（但没有库的概念）

hbase(main):002:0> list_namespace
NAMESPACE                                                                                                      
default                                                                                                        
hbase                                                                                                          
2 row(s) in 0.1020 seconds

hbase有默认的两个namespace：
default
hbase

查看使用方法：
help ‘namespace’

Command: create_namespace
Create namespace; pass namespace name,
and optionally a dictionary of namespace configuration.
Examples:

  hbase> create_namespace 'ns1'
  hbase> create_namespace 'ns1', {'PROPERTY_NAME'=>'PROPERTY_VALUE'}

Command: describe_namespace
Describe the named namespace. For example:
  hbase> describe_namespace 'ns1'

Command: drop_namespace
Drop the named namespace. The namespace must be empty.

Command: list_namespace
List all namespaces in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

  hbase> list_namespace
  hbase> list_namespace 'abc.*'

Command: list_namespace_tables
List all tables that are members of the namespace.
Examples:

  hbase> list_namespace_tables 'ns1'

操作一下：

hbase(main):008:0> create_namespace 'ns1'
0 row(s) in 0.0810 seconds

hbase(main):011:0> list_namespace
NAMESPACE                                                                                                      
default                                                                                                        
hbase                                                                                                          
ns1                                                                                                            
3 row(s) in 0.0390 seconds

hbase(main):012:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuanyuan'}
0 row(s) in 0.0900 seconds

hbase(main):013:0> describe_namespace 'ns1'
DESCRIPTION                                                                                                    
{NAME => 'ns1', NAME => 'gaoyuanyuan'}                                                                         
1 row(s) in 0.0040 seconds

hbase(main):014:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuan'}
0 row(s) in 0.0340 seconds

hbase(main):015:0> describe_namespace 'ns1'
DESCRIPTION                                                                                                    
{NAME => 'ns1', NAME => 'gaoyuan'}                                                                             
1 row(s) in 0.0030 seconds

hbase(main):016:0> alter_namespace 'ns1',{METHOD => 'unset',NAME =>'NAME'}
0 row(s) in 0.0310 seconds

hbase(main):017:0> describe_namespace 'ns1'
DESCRIPTION                                                                                                    
{NAME => 'ns1'}                                                                                                
1 row(s) in 0.0110 seconds

hbase(main):018:0> drop_namespace 'ns1'
0 row(s) in 0.0540 seconds

hbase(main):019:0> list_namespace
NAMESPACE                                                                                                      
default                                                                                                        
hbase                                                                                                          
2 row(s) in 0.0310 seconds

create_namespace ‘ns1’ //创建
list_namespace //查看
list_namespace_tables ‘ns1’ //查看空间中的表
alter_namespace ‘ns1’, {METHOD => ‘set’, ‘NAME’ => ‘GAOYUANYUAN’} //添加/修改属性：
alter_namespace ‘ns1’, {METHOD => ‘unset’, NAME => ‘NAME’}//删除属性
drop_namespace ‘ns1’ ###不能强制删除

DDL

Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

创建表：
create ‘ns1:t1’, {NAME => ‘f1’, VERSIONS => 5},{NAME => ‘f2’, VERSIONS => 3}
create ‘ns1:t2’, ‘f1’, SPLITS => [‘10’, ‘20’, ‘30’, ‘40’]

hbase(main):024:0> create_namespace 'ns1'
0 row(s) in 0.0530 seconds

hbase(main):025:0> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5},{NAME => 'f2', VERSIONS => 3}
0 row(s) in 4.3940 seconds

=> Hbase::Table - ns1:t1
hbase(main):026:0> list_namespace_tables 'ns1'
TABLE                                                                                                          
t1                                                                                                             
1 row(s) in 0.0200 seconds

hbase(main):027:0> create 'ns1:t2','f1',SPLITS => ['10','20','30','40']//分为5个region
0 row(s) in 2.2900 seconds

=> Hbase::Table - ns1:t2
hbase(main):028:0> list_namespace_tables 'ns1'
TABLE                                                                                                          
t1                                                                                                             
t2                                                                                                             
2 row(s) in 0.0280 seconds

通过网页也可以查看到。

修改表：（有就更新，没有则新增）
alter ‘ns1:t1’,‘f1’,{NAME => ‘f2’, VERSIONS => 3,BLOOMFILTER => ‘ROWCOL’,IN_MEMORY => ‘true’},{NAME => ‘f3’, VERSIONS => 6,BLOOMFILTER => ‘ROWCOL’,TTL => 246060}

hbase(main):029:0> alter 'ns1:t1','f1',{NAME => 'f2', VERSIONS => 3,BLOOMFILTER => 'ROWCOL',IN_MEMORY => 'true'},{NAME => 'f3', VERSIONS => 6,BLOOMFILTER => 'ROWCOL',TTL => 24*60*60}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 7.0220 seconds

删除列簇：

alter 'ns1:t1', NAME => 'f1', METHOD => 'delete'

查看表定义

describe ‘ns1:t1’

删除表：

disable 'ns1:t1'
drop 'ns1:t1'

DML

Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

插入数据：（不能一次性插入多列）

		表名	  rowkey行键		列名  列值
put 'ns1:t1','rk0001','f1:name','zhangsan'
put 'ns1:t1','rk0001','f1:age','18'
put 'ns1:t1','rk0001','f1:sex','1'

put 'ns1:t1','rk0002','f1:name','gaoyuanyuan'
put 'ns1:t1','rk0002','f1:age','18'
put 'ns1:t1','rk0002','f1:sex','2'

put 'ns1:t1','rk0003','f1:name','jiajingwen'
put 'ns1:t1','rk0003','f1:age','18'
put 'ns1:t1','rk0003','f1:sex','2'

put 'ns1:t1','rk0001111','f1:name','canglaoshi'
put 'ns1:t1','rk0001111','f1:age','18'
put 'ns1:t1','rk0001111','f1:sex','1'

put 'ns1:t1','rk0001','f2:addr','beijing'
put 'ns1:t1','rk0001','f1:size','123'

更新数据

put 'ns1:t1','rk0001','f1:name','zs1'

扫描数据：

scan 'ns1:t1'
scan 'ns1:t1',{COLUMNS => 'f1:name'}
scan 'ns1:t1',{COLUMNS => ['f1:name','f2:addr']}
scan 'ns1:t1', {RAW => true, VERSIONS => 10}

scan 'ns1:t1', {COLUMNS => 'f1:name', TIMERANGE => [1539173350832,1539173421219],VERSIONS => 3}   ###包头不包尾

查询数据：GET

get 'ns1:t1','rk0001'

删除数据：DELETE

	delete 'ns1:t1','rk0001','f1:age'
	deleteall 'ns1:t1','rk0001'

注：incr只能对long型的列进行自增操作