HBase physical storage and logical architecture

HBase physical storage

HBase all rows of the table row of keys in lexicographic order are arranged. Because the number of rows of a table contains very large, sometimes up to hundreds of millions of rows, so the need for distributed storage to multiple servers.

Therefore, when too much of a table row when, HBase will be based on the value of the key for the row rows in a table partitions, each line interval constitutes a "partition (Region)", contains a range located in the interval all data therein, as shown in FIG.

Region HBase storage mode of FIG.
FIG Region 1 HBase storage mode of FIG.


Region segmentation is based on the size of each table Region beginning only two, with the continuous data into a table, Region increasing, when a threshold value is increased, Region will be divided into two new Region. When the rows of the table increasing, there will be more and more Region, shown in Figure 2.

 HBase schematic division of the Region
FIG 2 HBase is a schematic split Region


HBase Region is the smallest unit of data distribution and load balancing, the default size is 100MB to 200MB. Different Region can be distributed in different Region Server, but does not split a Region on multiple Region Server. Each Region Region Server is responsible for managing a collection. As shown in Figure 3.

HBase distribution pattern of Region
FIG Region 3 HBase distribution pattern of


Region is the smallest unit in the HBase distributed data Region Server, but not the smallest unit of storage. In fact, each Store Region by one or more, each of a column of data stored Store family. Store in turn each of a plurality of memStore and 0 to Store File, as shown in Fig. Store File stored on HDFS to HFile format.

HBase storage mode of Region
FIG 4 HBase storage mode of Region

HBase logic architecture

在分布式的生产环境中,HBase 需要运行在 HDFS 之上,以 HDFS 作为其基础的存储设施。HBase 的上层是访问数据的 Java API 层,供应用访问存储在 HBase 中的数据。HBase 的集群主要由 Master、Region Server 和 Zookeeper 组成,具体模块如图 5 所示。

HBase system architecture
图 5  HBase的系统架构

1)Master

Master 主要负责表和 Region 的管理工作。

表的管理工作主要是负责完成增加表、删除表、修改表和查询表等操作。

Region 的管理工作更复杂一些,Master 需要负责分配 Region 给 Region Server,协调多个 Region Server,检测各个 Region Server 的状态,并平衡 Region Server 之间的负载。

当 Region 分裂或合并之后,Master 负责重新调整 Region 的布局。如果某个 Region Server 发生故障,Master 需要负责把故障 Region Server 上的 Region 迁移到其他 Region Server 上。

HBase 允许多个 Master 结点共存,但是这需要 Zookeeper 进行协调。当多个 Master 结点共存时,只有一个 Master 是提供服务的,其他的 Master 结点处于待命的状态。当正在工作的 Master 结点宕机时,其他的 Master 则会接管 HBase 的集群。

2)RegionServer

HBase has many Region Server, each Region Server also contains more Region. Region Server is the core module HBase, is responsible for maintaining its assigned Region Master collection and processing of read and write operations on these Region. Client connected directly to Region Server, and acquires the data via the communication HBase.

HBase HDFS preclude the use as the underlying file storage system, Region Server needs to write data to the HDFS, and utilize HDFS provide reliable and stable data storage. Region Server does not need to provide data replication and maintenance functions copies of the data.

3)Zookeeper

Zookeeper role is very important for the HBase. First, Zookeeper is HBase Master High Availability (High Available, HA) solutions. In other words, Zookeeper to ensure that there is at least one HBase Master is running.

Zookeeper is also responsible for the registration Region and Region Server. Master HBase cluster is the manager of the entire cluster, it must know the status of each Region Server.

HBase is to use Zookeeper Region Server to manage the state. Each Region Server are registered with the Zookeeper, Zookeeper by the monitor in real time the status of each Region Server, and notifies the Master. In this way, Master can always perceive the working status of each Region Server through Zookeeper.

Guess you like

Origin www.cnblogs.com/zkteam/p/11877286.html