hbase(main):007:0> list
TABLE                                                                                                                                                                       
0 row(s) in 0.0080 seconds

=> []

2、create：创建表，下面为HBase shell关于create命令的帮助

hbase(main):008:0> create

ERROR: wrong number of arguments (0 for 1)

Examples:

Create a table with namespace=ns1 and table qualifier=t1
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

Create a table with namespace=default and table qualifier=t1
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
  
Table configuration options can be put at the end.

create至少要有一个参数，其中t1为表名，f1，f2··· 为列族名

hbase(main):009:0> create 'student','info1','info2'
0 row(s) in 2.3310 seconds

=> Hbase::Table - student

list查看一下，已经创建完成：

hbase(main):010:0> list
TABLE                                                                                                                                                                      
student                                                                                                                                                                    
1 row(s) in 0.0100 seconds

=> ["student"]

3、查看表结构（describe '表名'）：

hbase(main):011:0> describe 'student'
Table student is ENABLED                                                                                                                                                                                               
student                                                                                                                                                                                                                
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                            
{NAME => 'info1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                              
{NAME => 'info2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                              
2 row(s) in 0.0420 seconds

我们发现一些数据，如：

NAME：是各个列族的名字，每个列族有一行信息

BLOOMFLTER：是过滤器的值，感兴趣的同学可以自行百度，这里默认是按行过滤

VERSIONS：能存储的最大版本数，版本后面会说道，其实这个值的意思就是帮你保存之前的数据，这里暂时不深究，后面会详细说到。

4、那如何在其它命名空间创建表呢？在表t1前加（命名空间.）就行了，如：

先查看所有的命名空间：

hbase(main):012:0> list_namespace
NAMESPACE                                                                                                                                                                                                              
default                                                                                                                                                                                                                
hbase                                                                                                                                                                                                                  
2 row(s) in 0.0360 seconds

创建表：

hbase(main):013:0> create 'hbase.student','info'
0 row(s) in 2.2410 seconds

=> Hbase::Table - hbase.student

查看一下：

hbase(main):014:0> list
TABLE                                                                                                                                                                                                                  
hbase.student                                                                                                                                                                                                          
student                                                                                                                                                                                                                
2 row(s) in 0.0080 seconds

=> ["hbase.student", "student"]

我们发现当前命名空间还是default，但可以显示其它命名空间中的表，且带有（命名空间.）的前缀，所以一般建议是不用更改命名空间的。

5、创建命名空间

hbase(main):015:0> create_namespace 'test'
0 row(s) in 0.9420 seconds

hbase(main):016:0> list_namespace
NAMESPACE                                                                                                                                                                                                              
default                                                                                                                                                                                                                
hbase                                                                                                                                                                                                                  
test                                                                                                                                                                                                                   
3 row(s) in 0.0220 seconds

6、删除表

删除表前，要先执行disable '表名'先将所有列族全部下线才能删除表，如：

hbase(main):018:0> disable 'student'
0 row(s) in 2.2520 seconds

hbase(main):019:0> drop 'student'
0 row(s) in 1.2430 seconds

hbase(main):020:0> list
TABLE                                                                                                                                                                                                                  
hbase.student                                                                                                                                                                                                          
1 row(s) in 0.0090 seconds

=> ["hbase.student"]

这里就不演示如果不disable就删除会发生什么，感兴趣的同学自己去试。

7、删除命名空间

删除命名空间需要要求先把命名空间内的表全部删除才能删除命名空间，如：

hbase(main):021:0> create 'test:stu','info'
0 row(s) in 1.2360 seconds

=> Hbase::Table - test:stu
hbase(main):022:0> list
TABLE                                                                                                                                                                                                                  
hbase.student                                                                                                                                                                                                          
test:stu                                                                                                                                                                                                               
2 row(s) in 0.0050 seconds

=> ["hbase.student", "test:stu"]
hbase(main):023:0> disable 'test:stu'
0 row(s) in 2.2360 seconds

hbase(main):024:0> drop 'test:stu'
0 row(s) in 1.2460 seconds

hbase(main):025:0> list
TABLE                                                                                                                                                                                                                  
hbase.student                                                                                                                                                                                                          
1 row(s) in 0.0060 seconds

=> ["hbase.student"]
hbase(main):026:0> drop_namespace 'test'
0 row(s) in 0.9080 seconds

hbase(main):027:0> list_namespace
NAMESPACE                                                                                                                                                                                                              
default                                                                                                                                                                                                                
hbase                                                                                                                                                                                                                  
2 row(s) in 0.0410 seconds

二、DML语言

1、增加数据

查看put用法：

hbase(main):028:0> put

ERROR: wrong number of arguments (0 for 4)

Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value', ts1
  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

至少需要4个参数，其中r1为row key;c1为列族：列;ts1为时间戳。

插入数据：

hbase(main):035:0> put 'stu','1001','info1:name','zhangsan'
0 row(s) in 0.0590 seconds

hbase(main):036:0> put 'stu','1002','info1:gender','mail'
0 row(s) in 0.0080 seconds

hbase(main):037:0> put 'stu','1002','info2:age','20'
0 row(s) in 0.0150 seconds

注意：HBase存储数据是稀疏的，所以并不是指定了一个列所有数据都必须要有这个列，和MySQL是完全不同的

2、读取数据

有两种读取方式：scan和get

其中：scan是扫描全表，而get是扫描行或列。

①scan的用法：

scan

ERROR: wrong number of arguments (0 for 1)

Some examples:

  hbase> scan 'hbase:meta'
  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default.  Example:

  hbase> scan 't1', {RAW => true, VERSIONS => 10}

因为数据比较多，所以我就截取了一下用的比较多的

查看全表数据：

hbase(main):039:0> scan 'stu'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0240 seconds

查看指定列数据：

hbase(main):041:0> scan 'stu',{COLUMNS=>'info1:name'}
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
1 row(s) in 0.0180 seconds

查看指定行的数据（左开右闭）：

hbase(main):045:0> scan 'stu',{STARTROW=>'1001',STOPROW=>'1003'}
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0090 seconds

②put用法：

hbase(main):047:0> get

ERROR: wrong number of arguments (0 for 2)

Here is some help for this command:
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:

  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
  hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}

get至少要有2个参数：表名和行键（rowkey）：

hbase(main):048:0> get 'stu','1002'
COLUMN                                                 CELL                                                                                                                                                            
 info1:gender                                          timestamp=1602914659577, value=mail                                                                                                                             
 info2:age                                             timestamp=1602914683317, value=20                                                                                                                               
1 row(s) in 0.0510 seconds

hbase(main):049:0> get 'stu','1002','info1'
COLUMN                                                 CELL                                                                                                                                                            
 info1:gender                                          timestamp=1602914659577, value=mail                                                                                                                             
1 row(s) in 0.0060 seconds

hbase(main):050:0> get 'stu','1002','info1:gender'
COLUMN                                                 CELL                                                                                                                                                            
 info1:gender                                          timestamp=1602914659577, value=mail                                                                                                                             
1 row(s) in 0.0030 seconds

3、修改数据

仍然使用put，，实际上也是put了一个数据，但此数据比之前数据的时间戳大，所以显示的就是这个值

hbase(main):051:0> put 'stu','1001','info1:name','lisi'
0 row(s) in 0.0130 seconds

hbase(main):054:0> scan 'stu'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0170 seconds

可以看到，zhangsan已经被改为lisi了，那之前说原来的数据没有被删除，那原来的数据怎么看呢？

hbase(main):056:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0170 seconds

VERSIONS表示最多可以查看的版本数。

可以发现原来的数据仍然在那。

那如果我插入数据时指定的时间戳小于当前数据的还会不会显示呢？我们来试验一下：

hbase(main):057:0> put 'stu','1001','info1:name','wangwu',1602915862640
0 row(s) in 0.0190 seconds

hbase(main):058:0> scan 'stu'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0120 seconds

hbase(main):059:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1001                                                  column=info1:name, timestamp=1602915862640, value=wangwu                                                                                                        
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0130 seconds

我们可以发现我插入的数据时间戳为：1602915862640，而lisi的时间戳为1602915862649，所以显示的数据为lisi，而不是我新插入的数据wangwu，但查看所有数据是又发现wangwu其实已经插入进去了。

4、删除数据

delete用于删除单个数据，deleteall用于删除一行数据即一个rowkey的数据

delete语法：

delete

ERROR: wrong number of arguments (0 for 3)

Here is some help for this command:
Put a delete cell value at specified table/row/column and optionally
timestamp coordinates.  Deletes must match the deleted cell's
coordinates exactly.  When scanning, a delete cell suppresses older
versions. To delete a cell from  't1' at row 'r1' under column 'c1'
marked with the time 'ts1', do:

  hbase> delete 'ns1:t1', 'r1', 'c1', ts1
  hbase> delete 't1', 'r1', 'c1', ts1
  hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same command can also be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.delete 'r1', 'c1',  ts1
  hbase> t.delete 'r1', 'c1',  ts1, {VISIBILITY=>'PRIVATE|SECRET'}

可以看到，至少需要3个数据

我们来删除一个元素试试：

hbase(main):061:0> delete 'stu','1002','info1:gender'
0 row(s) in 0.0290 seconds

hbase(main):062:0> scan 'stu'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0160 seconds

看到这不知道同学们有没有一个想法，如果我删除之前插入过多个的name，那么后面的name会不会出来呢？我们测试一下：

hbase(main):063:0> delete 'stu','1001','info1:name'
0 row(s) in 0.0150 seconds

hbase(main):064:0> scan 'stu'
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
1 row(s) in 0.0070 seconds

果然没有，那是它删除了所有数据吗？如果是删除所有数据那么之前的put完全也可以覆盖掉，因为保留以前的值是防止出错，那么put保留完全是无意义的，我们用scan查看所有数据看一下：

hbase(main):065:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602916724847, type=DeleteColumn                                                                                                   
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1001                                                  column=info1:name, timestamp=1602915862640, value=wangwu                                                                                                        
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
 1002                                                  column=info1:gender, timestamp=1602916630661, type=DeleteColumn                                                                                                 
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0140 seconds

我们发现，被删除的都插入了一行数据，但没有值，type为：DeleteColumn，且时间戳大于其他值的时间戳，所以原来的值是有保留的，只不过新插入的数据时间戳大于原来的，所以显示的是新插入的值，但新值没有value，有type，系统看它的type为DeleteColumn就不显示了，所以删除的实质还是插入，只不过是插入了一个标签。

同理我们来看一下deleteall

hbase(main):066:0> deleteall 'stu','1002'
0 row(s) in 0.0150 seconds

hbase(main):067:0> scan 'stu'
ROW                                                    COLUMN+CELL                                                                                                                                                     
0 row(s) in 0.0090 seconds

hbase(main):068:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW                                                    COLUMN+CELL                                                                                                                                                     
 1001                                                  column=info1:name, timestamp=1602916724847, type=DeleteColumn                                                                                                   
 1001                                                  column=info1:name, timestamp=1602915862649, value=lisi                                                                                                          
 1001                                                  column=info1:name, timestamp=1602915862640, value=wangwu                                                                                                        
 1001                                                  column=info1:name, timestamp=1602914628219, value=zhangsan                                                                                                      
 1002                                                  column=info1:, timestamp=1602917111993, type=DeleteFamily                                                                                                       
 1002                                                  column=info1:gender, timestamp=1602916630661, type=DeleteColumn                                                                                                 
 1002                                                  column=info1:gender, timestamp=1602914659577, value=mail                                                                                                        
 1002                                                  column=info2:, timestamp=1602917111993, type=DeleteFamily                                                                                                       
 1002                                                  column=info2:age, timestamp=1602914683317, value=20                                                                                                             
2 row(s) in 0.0090 seconds

我们发现1002多了一列，type为：DeleteFamily，代表这个行被删除

5、清空表数据

truncate，与删除表不同，这个函数自带disable，所以不需要disable

hbase(main):069:0> disable 'stu'
0 row(s) in 2.3480 seconds

hbase(main):071:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW                                                    COLUMN+CELL                                                                                                                                                     

ERROR: stu is disabled.

第四章：HBase shell

HBase

前言

一、DDL语言

1、list:查看当前default命名空间下所有的表

2、create：创建表，下面为HBase shell关于create命令的帮助

3、查看表结构 （describe '表名'）：

4、那如何在其它命名空间创建表呢？在表t1前加 （命名空间.） 就行了，如：

5、创建命名空间

6、删除表

7、删除命名空间