(A) .Hbase basic introduction
1.hbase hdfs above is established to provide high reliability, high performance, storage columns, scalable, real-time database system to read and write
2.hbase Features:
HBase Everything is stored in bytes
of HBase RowKey be sorted byte order, and adds the index
HBase automatically cut according to the number of row Region, maintaining load balancing and redundancy
3.hbase storage structure:
RowKey: a Byte array, each record in the table is the "master key", easy to quickly find, Rowkey design is very important;
Column Family: column families, has a name (string), contains one or more than correlation column; the column under the same column group having the same properties
column: belong to a columnfamily, familyName: columnName, each record can be dynamically added;
the Cell: wherein the timestamp is a time stamp, value is the value of the corresponding column rowkey
hbase(main):009:0> scan 'User'
ROW COLUMN+CELL
id001 column=personInfo:name, timestamp=1502368030841, value=xiaoming
id001 column=personInfo:age, timestamp=1502368069926, value=18
id001 column=personInfo:sex, timestamp=1502368093636, value=man
(B) .Hbase common commands
1. Enter the shell: hbase shell
[hadoop@indb-3-136-hzifc bin]$ echo $HBASE_HOME
/data/program/hbase
[hadoop@indb-3-136-hzifc bin]$ /data/program/hbase/bin/hbase shell
2. Check all the tables: list
hbase(main):003:0> list
T
TABLE
S
SYSTEM.CATALOG
S
SYSTEM.FUNCTION
S
SYSTEM.SEQUENCE
S
SYSTEM.STATS
T
TEST.USER
U
User
6 row(s) in 0.0340 seconds
3. To view a table below for details: describe
hbase(main):004:0> describe 'User'
T
Table User is ENABLED
U
User
C
COLUMN FAMILIES DESCRIPTION
{
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE
V
VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>
'0'}
Row. 1 (S) in 0.1410 seconds The
4. Create a table: create
Syntax: create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
Create a User table, columns can be one or more aromatic info
hbase (main): 002: 0 > create 'User', ' the INFO1 '
0 Row (S) in 1.5890 seconds The
5. Remove the specified column family: delete
语法: alter 表名,'delete' =>'列族'
hbase(main):002:0> alter 'User', 'delete' => 'info'
U
Updating all regions with the new schema...
1/1 regions updated.
D
Done.
Row 0 (S) in 2.5340 seconds The
6. The insert data: put
语法:put <table>,<rowkey>,<family:column>,<value>
hbase(main):005:0> put 'User', 'row1', 'info:name', 'xiaoming'
0 row(s) in 0.1200 seconds
hbase(main):006:0> put 'User', 'row2', 'info:age', '18'
0 row(s) in 0.0170 seconds
hbase(main):007:0> put 'User', 'row3', 'info:sex', 'man'
0 row(s) in 0.0030 seconds
The rowKey query a record: get
语法:get <table>,<rowkey>,[<family:column>,....]
hbase(main):008:0> get 'User', 'row2'
COLUMN CELL
info:age timestamp=1502368069926, value=18
1 row(s) in 0.0280 seconds
hbase(main):028:0> get 'User', 'row3', 'info:sex'
COLUMN CELL
info:sex timestamp=1502368093636, value=man
hbase(main):036:0> get 'User', 'row1', {COLUMN => 'info:name'}
COLUMN CELL
info:name timestamp=1502368030841, value=xiaoming
1 row(s) in 0.0120 seconds
8. query all records: scan
Syntax: scan <table>, {COLUMNS => [<family: column>, ....], LIMIT => num}
scans recorded
hbase (main): 009: 0 > scan 'User'
ROW COLUMN+CELL
row1 column=info:name, timestamp=1502368030841, value=xiaoming
row2 column=info:age, timestamp=1502368069926, value=18
row3 column=info:sex, timestamp=1502368093636, value=man
3 row(s) in 0.0380 seconds
扫描前2条
hbase(main):037:0> scan 'User', {LIMIT => 2}
R
ROW COLUMN+CELL
row1 column=info:name, timestamp=1502368030841, value=xiaoming
row2 column=info:age, timestamp=1502368069926, value=18
2 row(s) in 0.0170 seconds
范围查询
hbase(main):011:0> scan 'User', {STARTROW => 'row2'}
R
ROW COLUMN+CELL
row2 column=info:age, timestamp=1502368069926, value=18
row3 column=info:sex, timestamp=1502368093636, value=man
2 row(s) in 0.0170 seconds
hbase(main):012:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row2'}
R
ROW COLUMN+CELL
row2 column=info:age, timestamp=1502368069926, value=18
1 row(s) in 0.0110 seconds
HBase (main): 013: 0> Scan 'the User', {StartRow => 'ROW2', endRow => 'Row3'}
R & lt
the ROW the COLUMN + the CELL
ROW2 column = info: Age, timestamp = 1502368069926, value = 18 is
. 1 Row (s) in 0.0120 seconds
in addition, you can also add advanced features such as TIMERANGE and FITLER
STARTROW, eNDROW must be capitalized, otherwise an error; the query results do not include the results set equal eNDROW
9. Statistics Number of records: count
Syntax: COUNT <Table>, {the INTERVAL => intervalNum, the CACHE =>} cacheNum
the INTERVAL row shows a set number and the corresponding RowKey default 1000; CACHE each fetch buffer area size, default is 10, the parameters can be adjusted speed up the search
HBase (main): 020: 0> COUNT 'the User'
. 3 Row (S) in 0.0360 seconds The
10. The delete: delete
Remove column
HBase (main): 008: 0> Delete 'the User', 'ROW1', 'info: Age'
0 Row (S) in 0.0290 seconds The
delete rows
hbase (main): 014: 0 > deleteall 'User', 'ROW2'
0 Row (S) in 0.0090 seconds the
clear all data in
HBase (main): 016: 0> TRUNCATE 'the User'
T
Truncating 'the User' table (IT On May Take the while A):
- Disabling table ...
- Truncating table ...
Row 0 (S) in 3.6610 seconds The
11. Check whether table exists: exists
hbase(main):022:0> exists 'User'
T
Table User does exist
Row 0 (S) in 0.0150 seconds The
12. The disable table: disable
hbase(main):014:0> disable 'User'
0 row(s) in 2.2660 seconds
Table 13. Enable: enable
hbase(main):017:0> enable 'User'
0 row(s) in 1.3470 seconds
14. Delete the table: drop
Before deleting, you must first disable
hbase(main):031:0> disable 'TEST.USER'
0 row(s) in 2.2640 seconds
hbase(main):033:0> drop 'TEST.USER'
0 row(s) in 1.2490 seconds
(C) .scala hbase operation of api
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor,HBaseConfiguration,TableName}
import org.apache.hadoop.hbase.client.{ConnectionFactory,Put,Get,Delete,Scan}
import org.apache.hadoop.hbase.util.Bytes
import scala.collection.JavaConversions._
import java.util
val conf=HBaseConfiguration.create()
//Connection 的创建是个重量级的工作,线程安全,是操作hbase的入口
val conn=ConnectionFactory.createConnection(conf)
//从Connection获得 Admin 对象(相当于以前的 HAdmin)
val admin=conn.getAdmin
//本例将操作的表名
val userTable=TableName.valueOf("user_score_table")
val cf1="scoreInfo"
val cf2="addressInfo"
val cn1="math"
val cn2="physics"
val cn3="Addr"
if(admin.tableExists(userTable)){
println("Table exists!")
//admin.disableTable(userTable)
//admin.deleteTable(userTable)
//exit()
}else{
val tableDesc=new HTableDescriptor(userTable)
tableDesc.addFamily(new HColumnDescriptor("scoreInfo".getBytes))
tableDesc.addFamily(new HColumnDescriptor("addressInfo".getBytes))
admin.createTable(tableDesc)
println("Create table success!")
}
//插入一条rowkey 为 IromMan 的数据
val p=new Put("IromMan".getBytes())
//为put操作指定 column 和 value (以前的 put.add 方法被弃用了)
p.addColumn(cf1.getBytes,cn1.getBytes,"98".getBytes) // scoreInfo:math 98
p.addColumn(cf1.getBytes,cn2.getBytes,"87".getBytes) // scoreInfo:physics 87
p.addColumn(cf2.getBytes,cn3.getBytes,"Beijing".getBytes) // addressInfo
table.put(p)
//按rowkey查询数据
val listGet=new util.ArrayList[Get]
val get=new Get(Bytes.toBytes("id002_Thor"))
val get2=new Get(Bytes.toBytes("id003_jack"))
listGet.add(get)
listGet.add(get2)
val resultArr=myTable.get(listGet).flatMap(z=>{
val cellArr=z.rawCells()
val valueArr=cellArr.map(n=>(Bytes.toString(z.getRow()),(Bytes.toString(CellUtil.cloneQualifier(n)),Bytes.toString(CellUtil.cloneValue(n)))))
valueArr
})
userTable.close()
conn.close()