为什么有hbase?
随着数据量越来越大,传统的关系型数据库不能满足存储需求,hive虽然能满足存储,但是不能满足非结构化或者半结构化的数据存储和高效查询。
HBASE是什么?
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables – billions of rows X millions of columns – atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
HBASE是一个开源的、分布式的、多版本的(数据可以保留多个版本)、可扩展的非关系型数据库。
HBASE是bigtable的开源java版本。是建立在hdfs之上,提供高可靠性、高性能、列式存储、可伸缩、实时读写的nosql数据库。
RDBMS:mysql,sqlserver,oracle,db2,access,excel等
NoSQL:HBASE、MongoDB、Redis、memcache等
适用场景:
需要处理海量的非结构化的数据进行存储,需要随机的近实时的读写数据
HBASE和hadoop的关系
HBASE是基于hadoop、存储依赖于hdfs
hbase的架构
client,zookeeper,hmaster
hregionserver,hlog,hregion,store,memstore,sorefile,hfile
client:
hbase的客户端,包含访问HBASE的接口(linux shell 、java api)
client维护着一些cache来加快对HBASE的访问,比如region的位置信息
zookeeper
监控master的状态,保证有且仅有一个active的master,达到高可用
存储所有的region的寻址入口—root表在哪台服务器上
实时监控hregionserver的状态,将regionserver的上下线信息实时的通知给master
存储HBASE的所有表信息(HBASE的schma),包括表名、列簇(column family)
hmaster(hbase的老大)
为regionserver分配region(新建HBASE表等)
负责regionserver的负载均衡
负责hregion的重新分配(regionserver异常、hregion变大时的一分为二)
hdfs上的垃圾文件回收
处理schema的更新请求
hregionserver:(HBASE的小弟)
regionserver维护master分配给它的region(管理region)
处理client对region的Io请求,并和hdfs进行交互
regionserver负责切分在运行过程中变大的region
hlog:
对HBASE的操作进行记录,使用wal写数据,优先写入hlog里面,然后写到memstore中,以防止数据丢失是可以进行回滚。
hregion:
HBASE中分布式存储和负载均衡的最小单元,表或者表的一小部分
store:
相当于一个列簇
memstore:
内存缓冲区,用于进行批量刷新数据到hdfs上
hstorefile:
hbase中的数据以hfile的形式存储到hdfs中
各组件之间的数量关系:
hmaster:hregionserver=1:n
hregionserver:hlog=1:1
hregionserver:hregion=1:n
hregion:store=1:n
store:memstore=1:1
store:storefile=1:n
storefile:hfile=1:1
HBASE的特点:###
模式:无模式
数据类型:单一,只支持byte[]
多版本:每个值可以保存多个版本
列式存储:每个列簇的数据存储到一个文件里
稀疏存储:如果key-value为null时,整个的数据不会占用存储空间
HBASE的关键字
rowkey:行键(相当于mysql 的主键,不允许重复、有顺序)
column family:列簇(列的集合)
column:列
timestamp:时间戳(显示当前时间)
version:版本号
cell:单元格
排序
1、在rowkey上有序,按照字典顺序正序排列
2、在列簇上有序,按照字典顺序进行排列
3、在列上有序,按照字典顺序进行排列
HBASE的安装
1、Standalone hbase
(1)解压并配置环境变量
tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local
vi /etc/profile
export HBASE_HOME=/usr/local/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin:
source /etc/profile
(2)配置hbase的参数
cd conf
vi hbase-env.sh
JAVA_HOME=/usr/local/jdk1.8.0_181
vi hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbasedata</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>
测试
hbase version
启动
bin/start-hbase.sh
连接客户端
hbase shell
2、Pseudo-Distributed(略)
配置文件中设置:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
可参考官网:http://hbase.apache.org/book.html#quickstart
3、Advanced - Fully Distributed
(1)解压并配置环境变量
tar -zxvf hbase-1.2.1-bin.tar.gz -C /usr/local
rm -rf docs
vi /etc/profile
export HBASE_HOME=/usr/local/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin:
source /etc/profile
(2)配置hbase的参数
cd ./conf
vi hbase-env.sh
exportJAVA_HOME=/usr/local/jdk1.8.0_181
export HBASE_MANAGES_ZK=false
注意:这里jdk如果为JDK8+,下面两句注释掉
# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
vi conf/regionservers //设置regionserver机器,hadoop01,hadoop02,hadoop03
vi backup-masters //备份master机器,hadoop02,hadoop03
vi hbase-site.xml
<!--配置hbase在hdfs上的根目录-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:9000/hbase</value>
</property>
<!--开启hbase的分布式集群开关-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--配置zk集群的地址-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<!--配置zk集群的数据存储位置-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zkdata</value>
</property>
<!-- 指定hbase的监控页面端口 -->
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
(3)注意:如果hdfs是高可用集群,则需要将hdfs-site.xml和core-site.xml两个文件copy到hbase的conf目录下,不是则忽略
[root@hadoop01 hadoop]# cp hdfs-site.xml core-site.xml $HBASE_HOME/conf
(4)分发hbase到其他两台机器
scp -r hbase-1.2.1 root@hadoop02:$PWD
scp -r hbase-1.2.1 root@hadoop03:$PWD
(5)启动hbase集群
hbase依赖于zookeeper、hdfs,先启动zk,再启动hdfs,最后启动hbase
zkServer.sh start
zkServer.sh status
start-dfs.sh
start-hbase.sh
查看进程:
[root@hadoop01 conf]# jps
56903 HRegionServer
55960 QuorumPeerMain
62328 Jps
56760 HMaster
56186 NameNode
56333 DataNode
59309 Main
web监控端口:
60010
hmaster:16010
hregionserver:16030
内部通讯端口:16020
注意:
时间同步
HBASE的shell命令
连接客户端
hbase shell
可以通过help学习hbase shell的使用:
help
help 'COMMAND'
help 'COMMAND_GROUP'
hbase(main):004:0> help
HBase Shell, version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: abort_procedure, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
查看:
list
list_namespace
namespace:命名空间、名称空间或者组的概念,相当于库(但没有库的概念)
hbase(main):002:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.1020 seconds
hbase有默认的两个namespace:
default
hbase
查看使用方法:
help ‘namespace’
Command: create_namespace
Create namespace; pass namespace name,
and optionally a dictionary of namespace configuration.
Examples:
hbase> create_namespace 'ns1'
hbase> create_namespace 'ns1', {'PROPERTY_NAME'=>'PROPERTY_VALUE'}
Command: describe_namespace
Describe the named namespace. For example:
hbase> describe_namespace 'ns1'
Command: drop_namespace
Drop the named namespace. The namespace must be empty.
Command: list_namespace
List all namespaces in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list_namespace
hbase> list_namespace 'abc.*'
Command: list_namespace_tables
List all tables that are members of the namespace.
Examples:
hbase> list_namespace_tables 'ns1'
操作一下:
hbase(main):008:0> create_namespace 'ns1'
0 row(s) in 0.0810 seconds
hbase(main):011:0> list_namespace
NAMESPACE
default
hbase
ns1
3 row(s) in 0.0390 seconds
hbase(main):012:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuanyuan'}
0 row(s) in 0.0900 seconds
hbase(main):013:0> describe_namespace 'ns1'
DESCRIPTION
{NAME => 'ns1', NAME => 'gaoyuanyuan'}
1 row(s) in 0.0040 seconds
hbase(main):014:0> alter_namespace 'ns1',{METHOD => 'set','NAME' => 'gaoyuan'}
0 row(s) in 0.0340 seconds
hbase(main):015:0> describe_namespace 'ns1'
DESCRIPTION
{NAME => 'ns1', NAME => 'gaoyuan'}
1 row(s) in 0.0030 seconds
hbase(main):016:0> alter_namespace 'ns1',{METHOD => 'unset',NAME =>'NAME'}
0 row(s) in 0.0310 seconds
hbase(main):017:0> describe_namespace 'ns1'
DESCRIPTION
{NAME => 'ns1'}
1 row(s) in 0.0110 seconds
hbase(main):018:0> drop_namespace 'ns1'
0 row(s) in 0.0540 seconds
hbase(main):019:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.0310 seconds
create_namespace ‘ns1’ //创建
list_namespace //查看
list_namespace_tables ‘ns1’ //查看空间中的表
alter_namespace ‘ns1’, {METHOD => ‘set’, ‘NAME’ => ‘GAOYUANYUAN’} //添加/修改属性:
alter_namespace ‘ns1’, {METHOD => ‘unset’, NAME => ‘NAME’}//删除属性
drop_namespace ‘ns1’ ###不能强制删除
DDL
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
创建表:
create ‘ns1:t1’, {NAME => ‘f1’, VERSIONS => 5},{NAME => ‘f2’, VERSIONS => 3}
create ‘ns1:t2’, ‘f1’, SPLITS => [‘10’, ‘20’, ‘30’, ‘40’]
hbase(main):024:0> create_namespace 'ns1'
0 row(s) in 0.0530 seconds
hbase(main):025:0> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5},{NAME => 'f2', VERSIONS => 3}
0 row(s) in 4.3940 seconds
=> Hbase::Table - ns1:t1
hbase(main):026:0> list_namespace_tables 'ns1'
TABLE
t1
1 row(s) in 0.0200 seconds
hbase(main):027:0> create 'ns1:t2','f1',SPLITS => ['10','20','30','40']//分为5个region
0 row(s) in 2.2900 seconds
=> Hbase::Table - ns1:t2
hbase(main):028:0> list_namespace_tables 'ns1'
TABLE
t1
t2
2 row(s) in 0.0280 seconds
通过网页也可以查看到。
修改表:(有就更新,没有则新增)
alter ‘ns1:t1’,‘f1’,{NAME => ‘f2’, VERSIONS => 3,BLOOMFILTER => ‘ROWCOL’,IN_MEMORY => ‘true’},{NAME => ‘f3’, VERSIONS => 6,BLOOMFILTER => ‘ROWCOL’,TTL => 246060}
hbase(main):029:0> alter 'ns1:t1','f1',{NAME => 'f2', VERSIONS => 3,BLOOMFILTER => 'ROWCOL',IN_MEMORY => 'true'},{NAME => 'f3', VERSIONS => 6,BLOOMFILTER => 'ROWCOL',TTL => 24*60*60}
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 7.0220 seconds
删除列簇:
alter 'ns1:t1', NAME => 'f1', METHOD => 'delete'
查看表定义
describe ‘ns1:t1’
删除表:
disable 'ns1:t1'
drop 'ns1:t1'
DML
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
插入数据:(不能一次性插入多列)
表名 rowkey行键 列名 列值
put 'ns1:t1','rk0001','f1:name','zhangsan'
put 'ns1:t1','rk0001','f1:age','18'
put 'ns1:t1','rk0001','f1:sex','1'
put 'ns1:t1','rk0002','f1:name','gaoyuanyuan'
put 'ns1:t1','rk0002','f1:age','18'
put 'ns1:t1','rk0002','f1:sex','2'
put 'ns1:t1','rk0003','f1:name','jiajingwen'
put 'ns1:t1','rk0003','f1:age','18'
put 'ns1:t1','rk0003','f1:sex','2'
put 'ns1:t1','rk0001111','f1:name','canglaoshi'
put 'ns1:t1','rk0001111','f1:age','18'
put 'ns1:t1','rk0001111','f1:sex','1'
put 'ns1:t1','rk0001','f2:addr','beijing'
put 'ns1:t1','rk0001','f1:size','123'
更新数据
put 'ns1:t1','rk0001','f1:name','zs1'
扫描数据:
scan 'ns1:t1'
scan 'ns1:t1',{COLUMNS => 'f1:name'}
scan 'ns1:t1',{COLUMNS => ['f1:name','f2:addr']}
scan 'ns1:t1', {RAW => true, VERSIONS => 10}
scan 'ns1:t1', {COLUMNS => 'f1:name', TIMERANGE => [1539173350832,1539173421219],VERSIONS => 3} ###包头不包尾
查询数据:GET
get 'ns1:t1','rk0001'
删除数据:DELETE
delete 'ns1:t1','rk0001','f1:age'
deleteall 'ns1:t1','rk0001'
注:incr只能对long型的列进行自增操作