hbase 操作

视频随笔
视频地址: hbase教程
1.与传统关系型数据库的区别
hbase 传统
分布式   单机
列动态增减   建表时候指定
只有字符串一种数据类型   数值,字符
空值不被存储   存储
不支持SQL
查询方式单一,通过rowkey,或rowkey范围,或全表扫描
列式   行式
非结构化,json  结构化

2.hbase特点:
分布式
快速随机写,基于key简单读  是否支持单挑更新?
亿级行,百万列  关系型数据库对列数有限制
列式存储
不支持sql,java api,(套一个壳通过SQL访问)

3.hbase能否替代关系型数据库
不支持事务,交易数据mysql
不能提供丰富的查询,join等
只能作为补充

4.hmaster作用
1.管理regionserver
2.管理ddl,源数据定义
 
5.regionserver作用
1.dml
2.wal(write ahead log)
 
6.简单概念:
DML(Data Manipulation Language)数据操纵语言命令使用户能够查询数据库以及操作已有数据库中的数据。
如insert,delete,update,select等都是DML.
DDL语句用语定义和管理数据库中的对象,如Create,Alter和Drop.
 
7.hbhbase逻辑视图;
类似sortedMap,其中key 是 (rowkey,column,version)组成的三维坐标,查询时候必须提供rowkey,根据查询粒度,column和version可选
 
8.hbase的物理存储:
1.table = n个region  按照rowkey水平切分
2.Region = n store 一个column family 一个store
3.store = 1个 memstore (内存) + n 个 hfile(hdfs文件) ,memstore 中的数据flush一次会产生一个hfile

9.hbase 设计建议
1.自己定义一个anmespace(database)
2.定义合理的schema
3.建表时设置合理预分区 pre-split auto-split force-split
4.选择合适的字段做rowkey,比如手机号,imsi
5.column family 和column的名字短一些,节省存储空间
6.设置合适的版本数量,建议保留3份

10.hbase 的操作
1.put 单条/批量操作,无update方法,类似map
2.delete 单条/批量操作
 
11.操作演练:
./hbase shell
1).简单状态查询
hbase(main):006:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
Took 0.0175 seconds 
                                                            
hbase(main):007:0> whoami
hadoop (auth:SIMPLE)
    groups: hadoop
Took 0.0006 seconds

2).查看某一具体命令用法

hbase(main):012:0> help "status"
Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
default is 'summary'. Examples:
  hbase> status
  hbase> status 'simple'
  hbase> status 'summary'
  hbase> status 'detailed'
  hbase> status 'replication'
  hbase> status 'replication', 'source'
  hbase> status 'replication', 'sink'
hbase(main):013:0> 

3)查看namespace 可以用tab补全功能

hbase(main):013:0> list_namespace
NAMESPACE                                                                       
default                                                                         
hbase                                                                           
2 row(s)
Took 0.1524 seconds                                                             
hbase(main):014:0> 

4).创建namespace  

reate             create_namespace   
hbase(main):019:0> create_namespace 'gp'
Took 0.2463 seconds                                                             
hbase(main):020:0> 
hbase(main):020:0> list_namespace
NAMESPACE                                                                       
default                                                                         
gp                                                                              
hbase                                                                           
3 row(s)
Took 0.0270 seconds    

5)创建带预分区的表: 

create ‘namespace:表名’,'列族',...
hbase(main):024:0>  create 'gp:test','info',{NUMREGIONS => 4, SPLITALGO => 'HexStringSplit'}
Created table gp:test
Took 2.6835 seconds                                                             
=> Hbase::Table - gp:test
hbase(main):025:0> desc 'gp:test'
Table gp:test is ENABLED                                                        
gp:test                                                                         
COLUMN FAMILIES DESCRIPTION                                                     
{NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_
BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals
e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLIC
ATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_ME
MORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'f
alse', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}       
1 row(s)
Took 0.3126 seconds                                                             
hbase(main):026:0>

6)修改表属性,将存储的version由一个 改为 3个

hbase(main):028:0> alter 'gp:test',{NAME=>'info',VERSIONS=>'3'}
Updating all regions with the new schema...
4/4 regions updated.
Done.
Took 2.3734 seconds                                                             
hbase(main):029:0> desc 'gp:test'
Table gp:test is ENABLED                                                        
gp:test                                                                         
COLUMN FAMILIES DESCRIPTION                                                     
{NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_
BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals
e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLIC
ATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_ME
MORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'f
alse', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}       
1 row(s)
Took 0.0597 seconds                                                             
hbase(main):030:0>

7)插入数据:

语法 put ‘namespace:tablename’,‘rowkey’,‘columnfamily:column’,‘value’,version(版本可不指定,默认是时间戳)
hbase(main):030:0>  put 'gp:test','123','info:col1','v1'
Took 0.2623 seconds                                                                                                                       
hbase(main):033:0> scan 'gp:test'
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, value=v1       
1 row(s)
Took 0.1840 seconds 

8)用get查询数据:

hbase(main):035:0>  put 'gp:test','456','info:col1','v2',12
Took 0.0188 seconds                                                             
hbase(main):036:0> scan 'gp:test'
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, value=v1       
 456                  column=info:col1, timestamp=12, value=v2                  
2 row(s)
Took 0.0526 seconds                                                             
hbase(main):037:0> get 'gp:test','123'
COLUMN                CELL                                                      
 info:col1            timestamp=1534082352792, value=v1                         
1 row(s)
Took 0.0783 seconds                                                             
hbase(main):038:0> 

9)get rowkey=‘123’ 的指定列

hbase(main):038:0>  put 'gp:test','123','info:col2','v3'
Took 0.0487 seconds                                                             
hbase(main):039:0> get 'gp:test','123','info:col1'
COLUMN                CELL                                                      
 info:col1            timestamp=1534082352792, value=v1                         
1 row(s)
Took 0.0104 seconds                                                             
hbase(main):040:0>

10)删除某一行的指定列:        

hbase(main):022:0> delete 'gp:test','123','info:col1'                                                    
hbase(main):043:0> scan 'gp:test'
ROW                   COLUMN+CELL                                               
 123                  column=info:col2, timestamp=1534082891558, value=v3       
 456                  column=info:col1, timestamp=12, value=v2                  
2 row(s)
Took 0.0606 seconds                                                             
hbase(main):044:0>

11)删除整行记录:

hbase(main):044:0> deleteall 'gp:test','456'
Took 0.0225 seconds                                                             
hbase(main):045:0> scan 'gp:test'
ROW                   COLUMN+CELL                                               
 123                  column=info:col2, timestamp=1534082891558, value=v3       
1 row(s)
Took 0.0687 seconds                                                             
hbase(main):046:0> 

执行delete操作之后并未马上删除数据,只是打上了delete标志
可以通过如下命令查看
hbase(main):050:0> scan 'gp:test', {RAW => true, VERSIONS => 10}
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, type=Delete    
 123                  column=info:col1, timestamp=1534082352792, value=v1       
 123                  column=info:col2, timestamp=1534082891558, value=v3       
 456                  column=info:, timestamp=1534083246672, type=DeleteFamily  
 456                  column=info:col1, timestamp=12, value=v2                  
2 row(s)
Took 0.1143 seconds                                                             
hbase(main):051:0> 
delete其实是一个put操作,插入了type=Deletexxx
目前数据还在memstore 中,未flush到hfile中

12)执行flush,major_compact后数据会被删掉

hbase(main):051:0> flush 'gp:test'
Took 0.8562 seconds                                                             
hbase(main):055:0> scan 'gp:test', {RAW => true, VERSIONS => 10}
ROW                   COLUMN+CELL                                               
 123                  column=info:col1, timestamp=1534082352792, type=Delete    
 123                  column=info:col2, timestamp=1534082891558, value=v3       
 456                  column=info:, timestamp=1534083246672, type=DeleteFamily  
2 row(s)
Took 0.0718 seconds 
hbase(main):002:0> major_compact 'gp:test'
Took 0.3532 seconds
hbase(main):001:0> scan 'gp:test', {RAW => true, VERSIONS => 10}
ROW                   COLUMN+CELL                                               
 123                  column=info:col2, timestamp=1534082891558, value=v3       
1 row(s)
Took 0.8065 seconds                                                             
hbase(main):002:0> 
生产中很少进行compact ,会阻塞读写

13)清空表和namespace                                                            

hbase(main):003:0> truncate 'gp:test'
Truncating 'gp:test' table (it may take a while):
Disabling table...
Truncating table...
Took 2.1177 seconds                                                             
hbase(main):004:0> scan 'gp:test'
ROW                   COLUMN+CELL                                               
0 row(s)
Took 1.1058 seconds                                                             
hbase(main):005:0> disable 'gp:test'
Took 0.5193 seconds                                                             
hbase(main):006:0> scan 'gp:test'
ROW                   COLUMN+CELL                                               
org.apache.hadoop.hbase.TableNotEnabledException: gp:test is disabled.
 at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:714)
 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328)
 at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
ERROR: Table gp:test is disabled!
For usage try 'help "scan"'
Took 0.1323 seconds                                                             
hbase(main):007:0> drop 'gp:test'
Took 0.3581 seconds                                                             
hbase(main):008:0> drop
drop             drop_all         drop_namespace   
hbase(main):008:0> list
list                         list_deadservers             
list_labels                  list_locks                   
list_namespace               list_namespace_tables        
list_peer_configs            list_peers                   
list_procedures              list_quota_snapshots         
list_quota_table_sizes       list_quotas                  
list_regions                 list_replicated_tables       
list_rsgroups                list_security_capabilities   
list_snapshot_sizes          list_snapshots               
list_table_snapshots         
hbase(main):008:0> list_namespace
list_namespace          list_namespace_tables   
hbase(main):008:0> list_namespace 'gp'
NAMESPACE                                                                       
gp                                                                              
1 row(s)
Took 0.1517 seconds                                                             
hbase(main):009:0> drop
drop             drop_all         drop_namespace   
hbase(main):009:0> drop_namespace 'gp'
Took 0.2719 seconds                                                             
hbase(main):010:0> list
list                         list_deadservers             
list_labels                  list_locks                   
list_namespace               list_namespace_tables        
list_peer_configs            list_peers                   
list_procedures              list_quota_snapshots         
list_quota_table_sizes       list_quotas                  
list_regions                 list_replicated_tables       
list_rsgroups                list_security_capabilities   
list_snapshot_sizes          list_snapshots               
list_table_snapshots         
hbase(main):010:0> list_namespace
list_namespace          list_namespace_tables   
hbase(main):010:0> list_namespace
NAMESPACE                                                                       
default                                                                         
hbase                                                                           
2 row(s)
Took 0.0322 seconds                                                             
hbase(main):011:0>
 
 
 
 
 

猜你喜欢

转载自www.cnblogs.com/jason-dong/p/hbasenote.html