HBase --- 数据模型(一)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/yangguosb/article/details/81866208

Table

这里写图片描述

Row

  1. 每行被RowKey(行键)唯一标识;
  2. 行按照RowKey的排序结果(字母顺序)进行存储;

A row in HBase consists of a row key and one or more columns with values associated with them. Rows are sorted alphabetically by the row key as they are stored. For this reason, the design of the row key is very important. The goal is to store data in such a way that related rows are near each other.

HBase表中的每行可以看做一个多维的map,如下图所示:
这里写图片描述

Column Family(列族)

  列族是若干列的集合,目的是将众多列进行分组,特定如下:

  1. 表中的每列都有相同的列族,HBase表稀疏的原因;
  2. 列族在HBase表创建后固定不变;

Each row in a table has the same column families, though a given row might not store anything in a given column family.

Column Qualifier(列限定符)

  1. 列族和列限定符唯一标识一列,格式为 列族:列限定符;
  2. 可在运行期间任意改变,即运行期间可任意添加和删除列;

A column qualifier is added to a column family to provide the index for a given piece of data. Given a column family content, a column qualifier might be content:html, and another might be content:pdf. Though column families are fixed at table creation, column qualifiers are mutable and may differ greatly between rows.

Cell(单元格)

  1. 行键、列族和列标识符唯一标识一个单元格;
  2. 单元格包含值value和时间戳timestamp(即版本号);

Timestamp(时间戳)

  单元格中的value是带有版本号的,用时间戳标识。特点如下:

  1. 往单元格写数据时,如果没有指定时间戳,则使用默认的RegionServer的当前时间;
  2. 读取单元格数据时,如果没有指定版本号,则返回最新的数据;

Regions

  Region是HBase表水平拆分的结果,一个HBase表被水平拆分成若干Regions,默认每个Region大小为1G;

HBase Tables are divided horizontally by row key range into “Regions.” A region contains all rows in the table between the region’s start key and end key.

参考:

  1. http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf
  2. 官网:http://hbase.apache.org/book.html#_namespace
  3. https://mapr.com/blog/in-depth-look-hbase-architecture/

猜你喜欢

转载自blog.csdn.net/yangguosb/article/details/81866208