Hbase the core but also the most difficult to understand is the data model, since unlike traditional relational database, although Hbase there table (Table), but also row (Row) and columns (Column), but with relational databases is Hbase the concept of a family of columns (column family), it will be a tissue or a plurality of columns together, HBase column must belong to a family.
Intersections of rows and columns of cells referred to as (Cell), when the version of the cell. Contents of the cell, which is an integral value of the column is an array of bytes.
HBase no data type, any column values are converted into a byte array storage.
HBase table rows are distinguished by a row key (Rowkey). OK key is used to uniquely determine the identity of a line.
HBase rows sorted by Rowkey, using the dictionary order sort.
These are the logical result of HBase, his physical structure and traditional relational databases are very different.
Logic Model
HBase logical model from Google's BigTable model. It can be understood as a sparse, long-term storage, multi-dimensional and sort of map.
The following example is BigTable a treatise on page 2 a slightly modified form. There is a named webtable
table contains two rows ( com.cnn.www
and com.example.www
) group and three columns, named contents
, anchor
and people
.
In this example, the first line com.cnn.www
( ), anchor
comprising two anchor:cssnsi.com
( anchor:my.look.ca
, ), contents
comprising a contents:html
( ). This example comprises a row of keys com.cnn.www
5 version of the row, and the row of keys having com.example.www
a row version. contents:html
Qualifier column contains the entire HTML given website. anchor
Group qualifier column contains links to sites each row represents the external site, and it has the link anchor
used in the text. people
Column series represent people associated with the site.
This table looks empty space in the cell is not in HBase, or practically absent. That's why HBase "sparse" in. Table view is not the only way to view the data in HBase, nor even the most accurate method. The following represents the multi-dimensional mapping of the same information. This is just a model for demonstration purposes and may not be completely accurate.
{
"com.cnn.www": {
contents: {
t6: contents:html: "<html>..."
t5: contents:html: "<html>..."
t3: contents:html: "<html>..."
}
anchor: {
t9: anchor:cnnsi.com = "CNN"
t8: anchor:my.look.ca = "CNN.com"
}
people: {}
}
"com.example.www": {
contents: {
t5: contents:html: "<html>..."
}
anchor: {}
people: {
t5: people:author: "John Doe"
}
}
}
Physical Model
Although Hbase table can be seen as a set of sparse line, but in a physical sense they are in accordance with the column family store. So the column can be added at any time.
Hbase is column-oriented, physical files stored in different rows of columns, a column family is stored in multiple HFile, the most important thing is a column family of data management will be the same Region.
Empty cells are not occupied by the physical storage space. Thus, the time stamp t8
of the pair contents:html
requested value of a column will not return a value. Similarly, the time stamp t9
of the pair anchor:my.look.ca
requested value will not return a value. However, if no time stamp, then the value of a particular column in the latest return. Given multiple versions, the latest version is the first version, because the timestamp descending store. Thus, if the timestamp is not specified, then the row com.cnn.www
request value in all columns would be: from the timestamp t6
of the contents:html
value from the time stamp t9
of the anchor:cnnsi.com
value from the time stamp t8
is anchor:my.look.ca
.
Operating Data Model
Four main data model operation is Get, Put, Scan and Delete. Operated by instantiating Table.
Version of the problem: Rowkey, Column (row and column group), Version together referred to as a cell in Hbase.
Column Rowkey value and is represented by a byte array, Version is represented by a long integer.
Get
Operation returns the attribute specified row, Get is implemented on the basis Scan. By default, if not specified version, by using the Get operation, it will return the most recent version of the Cell.
To return multiple versions, you need to set Get.setMaxVersions ()
To return to the other version than the latest version, see Get.setTimeRange ()
The default version Get sample
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
Get to the example given version of
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
get.setMaxVersions(3); // will return last 3 versions of row
Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of this column
PUT
Execution always put a time stamp to create cell
a new version. By default, the system uses the server currentTimeMillis
, but you can specify a version for each column (= long integer). This means that you can specify the time in the past or the future, or the value of the non-long time for the purpose.
Implicit versions of the sample
HBase will use the current time implicitly versioned following Put.
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Put put = new Put(Bytes.toBytes(row));
put.add(CF, ATTR, Bytes.toBytes( data));
table.put(put);
Explicit version Example
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Put put = new Put( Bytes.toBytes(row));
long explicitTimeInMs = 555; // just an example
put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
table.put(put);
DELETE
Delete by Table.delete] execution.
There are three different types of internal marked for deletion.
- Delete: Column for a particular version.
- Delete Column: applies to all versions of the columns.
- Delete Series: apply to specific ColumnFamily all columns
SCAN
Scanning Table
The following is an example of the table scan. Suppose a table having rows populated with key "row1", "row2", "row3", and then the other group is a key "abc1", row "abc2" and "abc3" a. The following example shows how to set Scan instance to return to "row" at the beginning of the line.
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Table table = ... // instantiate a Table instance
Scan scan = new Scan();
scan.addColumn(CF, ATTR);
scan.setRowPrefixFilter(Bytes.toBytes("row"));
ResultScanner rs = table.getScanner(scan);
try {
for (Result r = rs.next(); r != null; r = rs.next()) {
// process result...
}
} finally {
rs.close(); // always close the ResultScanner!
}
More real-time calculation, Hbase, Flink, Kafka and other related technologies Bowen, welcome attention to calculate real-time streaming