hadoop相关－HBase体系结构

HBase的体系结构遵从主从服务器架构

主：HBase Master

从：HRegion Server群

HBase中所有的服务器都是通过Zookeeper来协调，并处理运行期间可能出现的错误。

一、逻辑模型

以nutch-2.0下hbase存放数据的表"webpage"为例：

describe:

{NAME => 'webpage',

FAMILIES => [

{NAME => 'f', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'h', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'il', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'mk', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'mtdt', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'ol', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'p', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 's', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

]

}

由表的描述(describe)可以得到表的结构，如上所示：

表名：webpage

列簇(FAMILIES)：'f','h','il','mk','mtdt','ol','p','s'

一个FAMILY可以包含多个Column，webpage表中的列簇关系见nutch-2.0/conf/gora-hbase-mapping.xml，如下所示：

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at
  
  http://www.apache.org/licenses/LICENSE-2.0
  
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<gora-orm>
    
    <table name="webpage">
        <family name="p" maxVersions="1"/> <!-- This can also have params like compression, bloom filters -->
        <family name="f" maxVersions="1"/>
        <family name="s" maxVersions="1"/>
        <family name="il" maxVersions="1"/>
        <family name="ol" maxVersions="1"/>
        <family name="h" maxVersions="1"/>
        <family name="mtdt" maxVersions="1"/>
        <family name="mk" maxVersions="1"/>
    </table>
    <class table="webpage" keyClass="java.lang.String" name="org.apache.nutch.storage.WebPage">
        
        <!-- fetch fields                                       -->
        <field name="baseUrl" family="f" qualifier="bas"/>
        <field name="status" family="f" qualifier="st"/>
        <field name="prevFetchTime" family="f" qualifier="pts"/>
        <field name="fetchTime" family="f" qualifier="ts"/>
        <field name="fetchInterval" family="f" qualifier="fi"/>
        <field name="retriesSinceFetch" family="f" qualifier="rsf"/>
        <field name="reprUrl" family="f" qualifier="rpr"/>
        <field name="content" family="f" qualifier="cnt"/>
        <field name="contentType" family="f" qualifier="typ"/>
        <field name="protocolStatus" family="f" qualifier="prot"/>
        <field name="modifiedTime" family="f" qualifier="mod"/>
        
        <!-- parse fields                                       -->
        <field name="title" family="p" qualifier="t"/>
        <field name="text" family="p" qualifier="c"/>
        <field name="parseStatus" family="p" qualifier="st"/>
        <field name="signature" family="p" qualifier="sig"/>
        <field name="prevSignature" family="p" qualifier="psig"/>
        
        <!-- score fields                                       -->
        <field name="score" family="s" qualifier="s"/>
        <field name="headers" family="h"/>
        <field name="inlinks" family="il"/>
        <field name="outlinks" family="ol"/>
        <field name="metadata" family="mtdt"/>
        <field name="markers" family="mk"/>
    </class>
    
    <table name="host">
      <family name="mtdt" maxVersions="1"/>
      <family name="il" maxVersions="1"/>
      <family name="ol" maxVersions="1"/>
    </table>
    
    <class table="host" keyClass="java.lang.String" name="org.apache.nutch.storage.Host">
      <field name="metadata" family="mtdt"/>
      <field name="inlinks" family="il"/>
      <field name="outlinks" family="ol"/>
    </class>
    
</gora-orm>

webpage的逻辑结构如下图：

row key: Table的主键，Table中的记录按照Row Key排序，url倒排索引

timestamp: 时间戳，每次数据操作对应的时间戳，可以看作是数据的version number

Column Family：列簇，Table在水平方向有一个或者多个Column Family组成，一个Column Family中可以由任意多个Column组成，即Column Family支持动态扩展，无需预先定义Column的数量以及类型，所有Column均以二进制格式存储，用户需要自行进行类型转换。列簇预先定义，列动态扩展

其实，只要在某一时刻tn更新了某一行(row)，某一列簇：某一列的数据，在逻辑视图中就会增加一个timestamp=tn，然后在该行、该列簇、该列存储新增的内容，当然，如果在该时刻增加了该行多个列簇：列的数据，那么新增的内容将对应同一个时间戳。

问题：若在某时刻删除了某行：某列簇：某列的内容呢？

根据在下的测试，在表中没有记录相应的操作信息，就是直接把它给删掉了！（有待验证）

二、物理模型

hbase中的表被划分成多个HRegion，然后存储到HRegion Server群当中，HBase Master Server中存储的是数据到HRegion Server的映射。

在物理模型下，表的最小单位是cell ，一个cell是指由行、列簇：列所确定的单元，其中存放的内容是timestamp和value，在上面的逻辑视图中将timestamp抽象了出来，所以会存在空白的“虚拟单元”，这些空白的“虚拟单元”实际上不会被保存，所以在物理模型下就是按列簇来保存，当某列簇中的某列新增内容时就在该列簇：列中保存timestamp和value，此处的timestamp其实表征了版本，因为更新内容时并不删除原来的内容，只是增加本次的内容。而更新其他列簇的内容时并不需要在该列簇下写数据，其实也没有什么数据可写。

好像说的复杂了些，其实也很简单的道理，全是废话。

三、HRegion

当表的大小超过设置值(hbase.hregion.max.filesize )的时候，HBase会自动将表划分为不同的区域，表是这些区域的集合，靠主键区分，每个区域即是一个HRegion，一个region由[startkey,endkey)表示，不同的region会被Master分配给相应的RegionServer进行管理：

图中有三个regionserver: rs1,rs2,rs3，表由主键被分割成几个部分，分配给相应的regionserver管理，其中一个regionserver可以管理不同的Hregion。

四、HRegionServer

HRegionServer主要负责响应用户I/O请求，向HDFS文件系统中读写数据，是HBase中最核心的模块。

HRegionServer内部管理了一系列HRegion对象，每个HRegion对应了Table中的一个Region，HRegion中由多个HStore组成。

HStore存储是HBase存储的核心了，其中由两部分组成，一部分是MemStore ，一部分是StoreFiles 。用户写入的数据首先会放入MemStore，当MemStore(大小由hbase.hregion.memstore.flush.size设置，默认64M )满了以后会Flush成一个StoreFile（底层实现是HFile），当StoreFile增长到一定阈值文件数量(由hbase.hstore.blockingStoreFiles设置，默认7个 )，会触发Compact合并操作，将多个StoreFiles合并成一个StoreFile，合并过程中会进行版本合并和数据删除，因此可以看出HBase其实只有增加数据，所有的更新和删除操作都是在后续的compact过程中进行的，这使得用户的写操作只要进入内存中就可以立即返回，保证了HBase I/O的高性能。当StoreFiles Compact后，会逐步形成越来越大的StoreFile，当单个StoreFile大小超过一定阈值(hbase.hregion.max.filesize，默认 256M )后，会触发Split操作，同时把当前 Region Split成2个Region，父Region会下线，新Split出的2个孩子Region会被HMaster分配到相应的HRegionServer 上，使得原先1个Region的压力得以分流到2个Region上。

五、数据存储格式

HBase中的所有数据文件都存储在Hadoop HDFS文件系统上，主要包括上述提出的两种文件类型：

1. HFile， HBase中KeyValue数据的存储格式，HFile是Hadoop的二进制格式文件，实际上StoreFile就是对HFile做了轻量级包装，即StoreFile底层就是HFile

2. HLog File，HBase中WAL（Write Ahead Log）的存储格式，物理上是Hadoop的Sequence File

hadoop相关－HBase体系结构

猜你喜欢