HBase
第四章:HBase shell的使用
目录
2、create:创建表,下面为HBase shell关于create命令的帮助
4、那如何在其它命名空间创建表呢?在表t1前加 (命名空间.) 就行了,如:
前言
主要分为对命名空间进行操作的DDL和对数据进行操作的DML
提示:以下是本篇文章正文内容,下面案例可供参考
一、DDL语言
-
1、list:查看当前default命名空间下所有的表
hbase(main):007:0> list
TABLE
0 row(s) in 0.0080 seconds
=> []
-
2、create:创建表,下面为HBase shell关于create命令的帮助
hbase(main):008:0> create
ERROR: wrong number of arguments (0 for 1)
Examples:
Create a table with namespace=ns1 and table qualifier=t1
hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
Create a table with namespace=default and table qualifier=t1
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
Table configuration options can be put at the end.
create至少要有一个参数,其中t1为表名,f1,f2··· 为列族名
hbase(main):009:0> create 'student','info1','info2'
0 row(s) in 2.3310 seconds
=> Hbase::Table - student
list查看一下,已经创建完成:
hbase(main):010:0> list
TABLE
student
1 row(s) in 0.0100 seconds
=> ["student"]
-
3、查看表结构 (describe '表名'):
hbase(main):011:0> describe 'student'
Table student is ENABLED
student
COLUMN FAMILIES DESCRIPTION
{NAME => 'info1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'info2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0420 seconds
我们发现一些数据,如:
NAME:是各个列族的名字,每个列族有一行信息
BLOOMFLTER:是过滤器的值,感兴趣的同学可以自行百度,这里默认是按行过滤
VERSIONS:能存储的最大版本数,版本后面会说道,其实这个值的意思就是帮你保存之前的数据,这里暂时不深究,后面会详细说到。
-
4、那如何在其它命名空间创建表呢?在表t1前加 (命名空间.) 就行了,如:
先查看所有的命名空间:
hbase(main):012:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.0360 seconds
创建表:
hbase(main):013:0> create 'hbase.student','info'
0 row(s) in 2.2410 seconds
=> Hbase::Table - hbase.student
查看一下:
hbase(main):014:0> list
TABLE
hbase.student
student
2 row(s) in 0.0080 seconds
=> ["hbase.student", "student"]
我们发现当前命名空间还是default,但可以显示其它命名空间中的表,且带有(命名空间.)的前缀,所以一般建议是不用更改命名空间的。
-
5、创建命名空间
hbase(main):015:0> create_namespace 'test'
0 row(s) in 0.9420 seconds
hbase(main):016:0> list_namespace
NAMESPACE
default
hbase
test
3 row(s) in 0.0220 seconds
-
6、删除表
删除表前,要先执行disable '表名'先将所有列族全部下线才能删除表,如:
hbase(main):018:0> disable 'student'
0 row(s) in 2.2520 seconds
hbase(main):019:0> drop 'student'
0 row(s) in 1.2430 seconds
hbase(main):020:0> list
TABLE
hbase.student
1 row(s) in 0.0090 seconds
=> ["hbase.student"]
这里就不演示如果不disable就删除会发生什么,感兴趣的同学自己去试。
-
7、删除命名空间
删除命名空间需要要求先把命名空间内的表全部删除才能删除命名空间,如:
hbase(main):021:0> create 'test:stu','info'
0 row(s) in 1.2360 seconds
=> Hbase::Table - test:stu
hbase(main):022:0> list
TABLE
hbase.student
test:stu
2 row(s) in 0.0050 seconds
=> ["hbase.student", "test:stu"]
hbase(main):023:0> disable 'test:stu'
0 row(s) in 2.2360 seconds
hbase(main):024:0> drop 'test:stu'
0 row(s) in 1.2460 seconds
hbase(main):025:0> list
TABLE
hbase.student
1 row(s) in 0.0060 seconds
=> ["hbase.student"]
hbase(main):026:0> drop_namespace 'test'
0 row(s) in 0.9080 seconds
hbase(main):027:0> list_namespace
NAMESPACE
default
hbase
2 row(s) in 0.0410 seconds
二、DML语言
1、增加数据
查看put用法:
hbase(main):028:0> put
ERROR: wrong number of arguments (0 for 4)
Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:
hbase> put 'ns1:t1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
至少需要4个参数,其中r1为row key;c1为列族:列;ts1为时间戳。
插入数据:
hbase(main):035:0> put 'stu','1001','info1:name','zhangsan'
0 row(s) in 0.0590 seconds
hbase(main):036:0> put 'stu','1002','info1:gender','mail'
0 row(s) in 0.0080 seconds
hbase(main):037:0> put 'stu','1002','info2:age','20'
0 row(s) in 0.0150 seconds
注意:HBase存储数据是稀疏的,所以并不是指定了一个列所有数据都必须要有这个列,和MySQL是完全不同的
2、读取数据
有两种读取方式:scan和get
其中:scan是扫描全表,而get是扫描行或列。
①scan的用法:
scan
ERROR: wrong number of arguments (0 for 1)
Some examples:
hbase> scan 'hbase:meta'
hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default. Example:
hbase> scan 't1', {RAW => true, VERSIONS => 10}
因为数据比较多,所以我就截取了一下用的比较多的
查看全表数据:
hbase(main):039:0> scan 'stu'
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0240 seconds
查看指定列数据:
hbase(main):041:0> scan 'stu',{COLUMNS=>'info1:name'}
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1 row(s) in 0.0180 seconds
查看指定行的数据(左开右闭):
hbase(main):045:0> scan 'stu',{STARTROW=>'1001',STOPROW=>'1003'}
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0090 seconds
②put用法:
hbase(main):047:0> get
ERROR: wrong number of arguments (0 for 2)
Here is some help for this command:
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:
hbase> get 'ns1:t1', 'r1'
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> get 't1', 'r1', 'c1'
hbase> get 't1', 'r1', 'c1', 'c2'
hbase> get 't1', 'r1', ['c1', 'c2']
hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
get至少要有2个参数:表名和行键(rowkey):
hbase(main):048:0> get 'stu','1002'
COLUMN CELL
info1:gender timestamp=1602914659577, value=mail
info2:age timestamp=1602914683317, value=20
1 row(s) in 0.0510 seconds
hbase(main):049:0> get 'stu','1002','info1'
COLUMN CELL
info1:gender timestamp=1602914659577, value=mail
1 row(s) in 0.0060 seconds
hbase(main):050:0> get 'stu','1002','info1:gender'
COLUMN CELL
info1:gender timestamp=1602914659577, value=mail
1 row(s) in 0.0030 seconds
3、修改数据
仍然使用put,,实际上也是put了一个数据,但此数据比之前数据的时间戳大,所以显示的就是这个值
hbase(main):051:0> put 'stu','1001','info1:name','lisi'
0 row(s) in 0.0130 seconds
hbase(main):054:0> scan 'stu'
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602915862649, value=lisi
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0170 seconds
可以看到,zhangsan已经被改为lisi了,那之前说原来的数据没有被删除,那原来的数据怎么看呢?
hbase(main):056:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602915862649, value=lisi
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0170 seconds
VERSIONS表示最多可以查看的版本数。
可以发现原来的数据仍然在那。
那如果我插入数据时指定的时间戳小于当前数据的还会不会显示呢?我们来试验一下:
hbase(main):057:0> put 'stu','1001','info1:name','wangwu',1602915862640
0 row(s) in 0.0190 seconds
hbase(main):058:0> scan 'stu'
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602915862649, value=lisi
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0120 seconds
hbase(main):059:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602915862649, value=lisi
1001 column=info1:name, timestamp=1602915862640, value=wangwu
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0130 seconds
我们可以发现我插入的数据时间戳为:1602915862640,而lisi的时间戳为1602915862649,所以显示的数据为lisi,而不是我新插入的数据wangwu,但查看所有数据是又发现wangwu其实已经插入进去了。
4、删除数据
delete用于删除单个数据,deleteall用于删除一行数据即一个rowkey的数据
delete语法:
delete
ERROR: wrong number of arguments (0 for 3)
Here is some help for this command:
Put a delete cell value at specified table/row/column and optionally
timestamp coordinates. Deletes must match the deleted cell's
coordinates exactly. When scanning, a delete cell suppresses older
versions. To delete a cell from 't1' at row 'r1' under column 'c1'
marked with the time 'ts1', do:
hbase> delete 'ns1:t1', 'r1', 'c1', ts1
hbase> delete 't1', 'r1', 'c1', ts1
hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same command can also be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.delete 'r1', 'c1', ts1
hbase> t.delete 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
可以看到,至少需要3个数据
我们来删除一个元素试试:
hbase(main):061:0> delete 'stu','1002','info1:gender'
0 row(s) in 0.0290 seconds
hbase(main):062:0> scan 'stu'
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602915862649, value=lisi
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0160 seconds
看到这不知道同学们有没有一个想法,如果我删除之前插入过多个的name,那么后面的name会不会出来呢?我们测试一下:
hbase(main):063:0> delete 'stu','1001','info1:name'
0 row(s) in 0.0150 seconds
hbase(main):064:0> scan 'stu'
ROW COLUMN+CELL
1002 column=info2:age, timestamp=1602914683317, value=20
1 row(s) in 0.0070 seconds
果然没有,那是它删除了所有数据吗?如果是删除所有数据那么之前的put完全也可以覆盖掉,因为保留以前的值是防止出错,那么put保留完全是无意义的,我们用scan查看所有数据看一下:
hbase(main):065:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602916724847, type=DeleteColumn
1001 column=info1:name, timestamp=1602915862649, value=lisi
1001 column=info1:name, timestamp=1602915862640, value=wangwu
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1002 column=info1:gender, timestamp=1602916630661, type=DeleteColumn
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0140 seconds
我们发现,被删除的都插入了一行数据,但没有值,type为:DeleteColumn,且时间戳大于其他值的时间戳,所以原来的值是有保留的,只不过新插入的数据时间戳大于原来的,所以显示的是新插入的值,但新值没有value,有type,系统看它的type为DeleteColumn就不显示了,所以删除的实质还是插入,只不过是插入了一个标签。
同理我们来看一下deleteall
hbase(main):066:0> deleteall 'stu','1002'
0 row(s) in 0.0150 seconds
hbase(main):067:0> scan 'stu'
ROW COLUMN+CELL
0 row(s) in 0.0090 seconds
hbase(main):068:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW COLUMN+CELL
1001 column=info1:name, timestamp=1602916724847, type=DeleteColumn
1001 column=info1:name, timestamp=1602915862649, value=lisi
1001 column=info1:name, timestamp=1602915862640, value=wangwu
1001 column=info1:name, timestamp=1602914628219, value=zhangsan
1002 column=info1:, timestamp=1602917111993, type=DeleteFamily
1002 column=info1:gender, timestamp=1602916630661, type=DeleteColumn
1002 column=info1:gender, timestamp=1602914659577, value=mail
1002 column=info2:, timestamp=1602917111993, type=DeleteFamily
1002 column=info2:age, timestamp=1602914683317, value=20
2 row(s) in 0.0090 seconds
我们发现1002多了一列,type为:DeleteFamily,代表这个行被删除
5、清空表数据
truncate,与删除表不同,这个函数自带disable,所以不需要disable
hbase(main):069:0> disable 'stu'
0 row(s) in 2.3480 seconds
hbase(main):071:0> scan 'stu',{RAW=>true,VERSIONS=>10}
ROW COLUMN+CELL
ERROR: stu is disabled.