Hbase entry (three) - Data Model

file

Hbase the core but also the most difficult to understand is the data model, since unlike traditional relational database, although Hbase there table (Table), but also row (Row) and columns (Column), but with relational databases is Hbase the concept of a family of columns (column family), it will be a tissue or a plurality of columns together, HBase column must belong to a family.

Intersections of rows and columns of cells referred to as (Cell), when the version of the cell. Contents of the cell, which is an integral value of the column is an array of bytes.

HBase no data type, any column values ​​are converted into a byte array storage.

HBase table rows are distinguished by a row key (Rowkey). OK key is used to uniquely determine the identity of a line.

HBase rows sorted by Rowkey, using the dictionary order sort.

These are the logical result of HBase, his physical structure and traditional relational databases are very different.

file

Logic Model

HBase logical model from Google's BigTable model. It can be understood as a sparse, long-term storage, multi-dimensional and sort of map.

The following example is BigTable a treatise on page 2 a slightly modified form. There is a named webtabletable contains two rows ( com.cnn.wwwand com.example.www) group and three columns, named contents, anchorand people.

file

In this example, the first line com.cnn.www( ), anchorcomprising two anchor:cssnsi.com( anchor:my.look.ca, ), contentscomprising a contents:html( ). This example comprises a row of keys com.cnn.www5 version of the row, and the row of keys having com.example.wwwa row version. contents:htmlQualifier column contains the entire HTML given website. anchorGroup qualifier column contains links to sites each row represents the external site, and it has the link anchorused in the text. peopleColumn series represent people associated with the site.

This table looks empty space in the cell is not in HBase, or practically absent. That's why HBase "sparse" in. Table view is not the only way to view the data in HBase, nor even the most accurate method. The following represents the multi-dimensional mapping of the same information. This is just a model for demonstration purposes and may not be completely accurate.

{
  "com.cnn.www": {
    contents: {
      t6: contents:html: "<html>..."
      t5: contents:html: "<html>..."
      t3: contents:html: "<html>..."
    }
    anchor: {
      t9: anchor:cnnsi.com = "CNN"
      t8: anchor:my.look.ca = "CNN.com"
    }
    people: {}
  }
  "com.example.www": {
    contents: {
      t5: contents:html: "<html>..."
    }
    anchor: {}
    people: {
      t5: people:author: "John Doe"
    }
  }
} 

Physical Model

Although Hbase table can be seen as a set of sparse line, but in a physical sense they are in accordance with the column family store. So the column can be added at any time.

file

Hbase is column-oriented, physical files stored in different rows of columns, a column family is stored in multiple HFile, the most important thing is a column family of data management will be the same Region.

file

Empty cells are not occupied by the physical storage space. Thus, the time stamp t8of the pair contents:htmlrequested value of a column will not return a value. Similarly, the time stamp t9of the pair anchor:my.look.carequested value will not return a value. However, if no time stamp, then the value of a particular column in the latest return. Given multiple versions, the latest version is the first version, because the timestamp descending store. Thus, if the timestamp is not specified, then the row com.cnn.wwwrequest value in all columns would be: from the timestamp t6of the contents:htmlvalue from the time stamp t9of the anchor:cnnsi.comvalue from the time stamp t8is anchor:my.look.ca.

Operating Data Model

Four main data model operation is Get, Put, Scan and Delete. Operated by instantiating Table.

Version of the problem: Rowkey, Column (row and column group), Version together referred to as a cell in Hbase.

Column Rowkey value and is represented by a byte array, Version is represented by a long integer.

Get

Operation returns the attribute specified row, Get is implemented on the basis Scan. By default, if not specified version, by using the Get operation, it will return the most recent version of the Cell.

To return multiple versions, you need to set Get.setMaxVersions ()

To return to the other version than the latest version, see Get.setTimeRange ()

The default version Get sample

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR);  // returns current version of value 

Get to the example given version of

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
get.setMaxVersions(3);  // will return last 3 versions of row
Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR);  // returns current version of value
List<KeyValue> kv = r.getColumn(CF, ATTR);  // returns all versions of this column 

PUT

Execution always put a time stamp to create cella new version. By default, the system uses the server currentTimeMillis, but you can specify a version for each column (= long integer). This means that you can specify the time in the past or the future, or the value of the non-long time for the purpose.

Implicit versions of the sample

HBase will use the current time implicitly versioned following Put.

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Put put = new Put(Bytes.toBytes(row));
put.add(CF, ATTR, Bytes.toBytes( data));
table.put(put); 

Explicit version Example

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Put put = new Put( Bytes.toBytes(row));
long explicitTimeInMs = 555;  // just an example
put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
table.put(put); 

DELETE

Delete by Table.delete] execution.

There are three different types of internal marked for deletion.

  • Delete: Column for a particular version.
  • Delete Column: applies to all versions of the columns.
  • Delete Series: apply to specific ColumnFamily all columns

SCAN

Scanning Table

The following is an example of the table scan. Suppose a table having rows populated with key "row1", "row2", "row3", ​​and then the other group is a key "abc1", row "abc2" and "abc3" a. The following example shows how to set Scan instance to return to "row" at the beginning of the line.

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...

Table table = ...      // instantiate a Table instance

Scan scan = new Scan();
scan.addColumn(CF, ATTR);
scan.setRowPrefixFilter(Bytes.toBytes("row"));
ResultScanner rs = table.getScanner(scan);
try {
  for (Result r = rs.next(); r != null; r = rs.next()) {
    // process result...
  }
} finally {
  rs.close();  // always close the ResultScanner!
} 

More real-time calculation, Hbase, Flink, Kafka and other related technologies Bowen, welcome attention to calculate real-time streaming

file

Guess you like

Origin www.cnblogs.com/tree1123/p/11611062.html
Recommended