Hbase principle the concept of literacy

A, Hbase Profile

1. What is Hbase

Hbase prototype is google's BigTable paper, the paper received the inspired idea, as currently hadoop subprojects to develop maintenance, support for structured data storage.

Hbase is a high reliability (HDFS stored on, there is a copy of the mechanisms), high-performance, column-oriented, non-relational databases (similar Redis), scalable, distributed storage system (because it is stored in the HDFS), using hbase technology to build large-scale structure of the database storage cluster on inexpensive PC server.

Hbase goal is to store and process large data, more specifically, just use common hardware, can handle large data made up of thousands of rows and columns thereof.

Hbase distributed storage framework is built on hdfs, but Hbase achieve random read and write on hdfs changed to address the things that are not supported hdfs

 

2, Hbase features

A, mass storage

B, column storage

Here columnar storage in fact, that is column family is stored, Hbase can have a lot of columns according to column family column family to store the data below, the column family must be specified when creating the table

Hbase columns and mysql column is not a thing, Hbase column is his data

C, easily extended

Scalability Hbase mainly in two aspects, one is based on the ability to sort the upper extension (RegionServer, equivalent Datanode, processing read and write requests), one based on extended storage (HDFS)

Add RegionServer through horizontal machines, horizontal expansion enhance Hbase upper processing capacity, improve service more Hbase Region ability.

Note: the role RegionServer the management Region) (similar to the mysql table concept), following a read request of the client access, this latter will be described in detail by laterally adding datanode machine, for expansion of the storage layer to enhance Bhbase of literacy storage capacity and enhance the back-end storage

D, sparse

Sparse mainly for flexibility in hbase column, the column family, you can specify any number of columns, column data in the case of empty, will not take up storage space, and here mysql and other databases are not the same, mysql If each field has no value, then this field is null, not empty, and take up storage space

 

3, Hbase architecture

The following architecture diagram Hbase

 

 

 

And a HMaster Hbase HRegionServer composition, also depends on the availability of HMaster zk, of the Namenode similar hdfs;

HRegionServer hdfs equivalent of datanode, the actual read request processing node;

 

a、Zookeeper

HBase do Hmaster high availability through zk, RegionServer monitoring, configuration and metadata entry cluster maintenance work, particularly work into the

By zk to ensure that the cluster is running only one master, master if abnormal, will have a new master through a competitive mechanism to provide services

Zk RegionServer monitored by the state, when there is an abnormality RegionServer, notifies the master in the form of callback logout information RegionServer

Zk uniform entry address stored metadata;

 

b、HMaster

For the allocation Region RegionServer

Maintain load balancing cluster is assigned Region

Maintain metadata cluster information

Found that the failure of the Region, and the failure of Region assign to a normal RegionServer

When RegionServer failure, Hlog hdfs the block and the corresponding coordinate data recovery

 

C、HRegionServer

HRegionServer direct docking read and write requests of users, is the real work of nodes, his features are summarized below

Management master assigned to Region

Processing read and write requests from a client

Interactive, data storage and is responsible for the underlying hdfs to hdfs

Split Region is responsible for the increases

Responsible for Storefile merger work

 

D HDFS

Hdfs hbase to provide the ultimate underlying data storage service

And provide metadata table data stored in the underlying distributed service

Multiple copies of data to ensure high reliability and availability

 

E, Hlog

A HRegionServer only one Hlog, Hlog equivalent edits the file hdfs save Hbase change log, when Hbase to write data, the data is not directly written to disk, he will remain for some time in memory (time i data the threshold may be set amount). However, the data stored in the memory may have a higher probability of data loss, in order to solve this problem, the data is first written in a file called Hlog, Hlog stored on disk, also located on hdfs, then write to memory , so when a system failure or loss of memory, the data can be rebuilt from the log file

 

F、Region

Corresponds to Region in mysql table, there may be a plurality of HRegionServer Region, have a plurality Region HRegionServer; if the data table is too large, split, according to the average amount of data segmentation, a table of all HBase Region will correspond to one or more, when the content of the table is small, it corresponds to a a table Region, if the table is large, then this will Region segmentation, Region segmentation simultaneously splitting the Region of all Store.

 

G、Store

Store the equivalent of column families, the popular talk is a family column, in the hbase, want to create a column, you must specify the column family, that is, a column must belong to a column family. A table can have multiple column families, a store corresponds to a column family, hbase official does not recommend more than one column family, a column family can engage hundreds of columns, is sufficient. However, if a HRegion is then sliced, cut breakdown is aromatic, so even if only a HRegion a list, the segmentation will correspond to a plurality of Region Store, strore are assigned to a plurality of other storage nodes HRegionServer

 

H, MemStore

MemStore is a family of data columns in memory, write data, and will be written to memory, as long as the memory write is successful, the returns.

 

I、StoreFile

StoreFile, data in memory insecurity, but is limited in size, so it is necessary to in-memory data written to disk to store Hfile format on hdfs. Every memstore brush to form a storefile, so many storefile would, however small, because the memory itself is not large, behind storeFile will merge, but this is only the internal consolidation of a column family StoreFile merger, will not cross the combined column family

 

J, HFILE

This is the actual physical files stored on the disk of the original data is actually stored files, storefile Hfile is stored in the form of hdfs

 

Two, Hbase installation

1, you must first install zk

2, and then you want to install hdfs

3, and finally install hbase

4, extract, modify the configuration file

Here focus on that modify the configuration file, the front did not say, because I use ambari tools in actual use to install

First modify hbase-env.sh 

Java configuration environment variable

 

 

 

export JAVA_HOME=/usr/lib/jvm/java

  

Configuring zk, Hbase also strongly dependent on the zookeeper, and whether you want to enable their zookeeper. If was true, if the external zookeeper, and false

 

 

 

export HBASE_MANAGES_ZK=false

  

Configuring hbase-site.xml 

 

 

 

<property>
      <name>hbase.rootdir</name>
      <value>/apps/hbase/data</value>
    </property>

  

Whether a cluster configuration hbase enabled

 

 

 

    <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
    </property>

  

Set Hbase service port number, not the web port number, web port number is 16010

 

 

 

  <property>
      <name>hbase.master.info.port</name>
      <value>16010</value>
    </property>

  

 

 

 

    <property>
      <name>hbase.master.port</name>
      <value>16000</value>
    </property>

  

Zk configured to connect

 

 

 

   <property>
      <name>hbase.zookeeper.quorum</name>
      <value>abdi1,abdi2,abdi3</value>
    </property>

  

Zk parent directory to store data, mainly in order to distinguish between multiple clusters hbase

 

 

 

    <property>
      <name>zookeeper.znode.parent</name>
      <value>/hbase-unsecure</value>
    </property>

  

配置regionservers文件

指定RegionServer的节点

 

 

 

由于hbase是强依赖于hdfs的,需要拷贝hdfs的配置文件到hbase的conf目录

我们一般情况会这样操作,创建一个软链接,链接到hdfs的core-site.xml和hdfs-site.xml中,就是让hbase知道我要连接哪个hadoop集群

 

 

 

但是在ambari安装的hbase的配置文件中没有找到相应的配置,但是在hbase启动的时候有加载hdfs的环境变量

 

 

 

启动hbase,可以看到有Hmaster和HRegionServer的java进程

 

 

ambari的web页面显示效果如下

 

 

 

 

 

 

注意:Hbase的Master和RegionServer安装是一样的,只是看我们是否要启动master

Hbase的web页面,采用16010端口

 

 

 

三、Hbase的简单shell操作

1、进入hbase shell

[root@abdi2 bin]# /usr/hdp/current/hbase-client/bin/hbase shell

  

2、查看当前有哪些表:list

hbase(main):003:0> list
TABLE                                                                                                                                                                                                                                                                         
0 row(s)
Took 0.2713 seconds                                                                                                                                                                                                                                                           
=> []
hbase(main):004:0> 

  

3、创建表操作。这里的列族是必须要指定的,就是和mysql的列一样:create "student","info"

hbase(main):004:0> create "student","info"
Created table student
Took 1.3445 seconds                                                                                                                                                                                                                                                           
=> Hbase::Table - student
hbase(main):005:0> 
hbase(main):006:0> list
TABLE                                                                                                                                                                                                                                                                         
student                                                                                                                                                                                                                                                                       
1 row(s)
Took 0.0055 seconds                                                                                                                                                                                                                                                           
=> ["student"]

  

 

 

 

 

 

 

4、插入数据。Hbase中的数据没有什么类型,比如字符串,hash等,全部是字节:put "student","1001","info:name","laowang"

hbase(main):007:0> put "student","1001","info:name","laowang"
Took 0.1217 seconds                                                                                                                                                                                                                                                           
hbase(main):008:0> put "student","1001","info:age","18"
Took 0.0038 seconds                                                                                                                                                                                                                                                           
hbase(main):009:0> put "student","1001","info:sex","male"
Took 0.0049 seconds                                                                                                                                                                                                                                                           
hbase(main):010:0> put "student","1002","info:name","laoluo"
Took 0.0036 seconds                                                                                                                                                                                                                                                           
hbase(main):011:0> put "student","1002","info:age","20"
Took 0.0035 seconds  

  

 

 

 

5、扫描查看数据:scan “student”

 

 

6、扫描查看数据,指定起始和截止Rowkey,前闭后开

 

 

7、查看指定Rowkey

 

 

 

8、查看指定行的指定列

 

 

 

9、更新数据

 

 

 

10、查看表结构

重点关注列族和版本即可,这里的版本是个数的意思,就一条数据存储几个版本

 

 

 

11、修改列族的版本信息

 

 

 

多更新几次数据

 

 

 

可以查看到有多个版本,这里的意思查看3个版本的数据,所以有三条,下面的命令是查看2个版本的数据,所以有两条

 

 

 

 

12、删除操作

删除某个Rowkey的指定列,可以看到其他列的数据还在,删除还可以指定时间戳,该时间戳之前的数据都会被删除

 

 

 

 

 

 

删除Rowkey对应的所有数据

 

 

 

 

 

13、统计条数

统计条数,Rowkey有几个,条数就有几条

 

 

 

14、清空表

 

 

15、删除表

 

 

 

 

16、命名空间(namespace)操作

命令空间,相当于数据库中的database

所有的表都是命名空间的成员,如果不指定,则默认在default的命名空间中

 

命名空间可以设置权限,比如定义访问控制列表,例如创建表,读取表,删除,更新操作,权限用的很少

 

Shell命令查看namespace、创建namespace

 

 

 

 

 

Hbase就是存储元数据的命名空间,是系统自己用的,不能给用户使用

在指定命名空间下建表

 

 

 

 

 

 

 

 

 

四、Hbase的数据结构

1、Rowkey

Rowkey是用来检索记录的主键,访问Hbase table中的行,只有三种方式

A、 通过单个Rowkey访问

B、 通过Rowkeyrange访问

 

C、 全表扫描

设计Rowkey非常重要也是Hbase里最重要的一门学问,数据会按照Rowkey的字典序排序进行存储,所以设计Rowkey要利用这个特性,把经常一起读取的行存储在一起,学习Hbase,Rowkey设计是学习的重点

 

2、Column Family

列族,Hbase表中的每个列,都会属于某个列族,列族是表的结构的一部分,列族在建表的时候必须要指定。列名都是以列族做为前缀。

 

在创建表的时候需要指定列族,列族可以指定多个

 

3、Cell

 

Rowkeycolumn Familycolumnversion唯一确定的单元,cell中的数据是没有类型的,全部都是字节的形式存储

 

 

4、Time Stamp

时间戳,每个cell都保存着同一份数据的多个版本,版本通过时间戳来索引。时间戳可以由系统生成,也可以自己指定。每个cell中,不同版本的数据按照时间倒序排列,即最新的数据在最前面

 

通过时间戳不同来确定版本的

 

五、Hbase的原理

Hbase的写比读还快

 

1、读流程,hmaster没有关系,hmaster挂掉后,不影响读流程

 

 

a、先获取meta表的位置,也就元数据这张表存储的位置

b、meta表所在位置获取meta表的信息,meta表存储的内容大致入下

Student 0 ----10000 rs1

Student 100001---20000 rs2

Stff        0---10000  rs3

Stff       10000—200000 rs4

 

c、然后在去对应的regionserver获取对应的数据

d、获取数据,先去内存中获取,如果内存中没有,到blockcache中获取,如果blockcash没有,则去磁盘获取,这里为什么先去内存获取数据?

 

e、返回数据的时候,先把数据写到blockcache中,然后在返回给client

 

 

Meta表的位置

 

 

Zk上查看meta表的存储位置

 

 

查看meta表的内容

 

 

2、写流程,和Hmaster没有关系

 

a、clientzk获取meta表的位置

b、Zk返回meta表的位置

c、Zkregionserver读取meta表的内容

d、Regionservermeta表的内容返回

e、去对应的regionserver开始执行写操作,先写Hlog文件,然后写到memstore,成功后,立刻返回,写入流程完成

 

因为先写到内存中,那么什么时候会刷到硬盘中呢

 

 

a、Regionserver的使用的总内存达到堆内存的40%

 

 

b、满足一个小时的条件,会刷memstore到硬盘中

 

 

c、单个region里的所有的Memstore加起来达到128MB,则会刷memstore到硬盘中

 

 

这样就会有很多小文件刷到hdfs中,但是hdfs不适合存储很多的小文件

 

默认是7天做一次合并

 

 

 

 超过7天合并storefile文件

超过3storefile文件,会进行合并

这个是合并一个列族的的storefile,不同列族的storefile文件不会进行合并的

 

 3、高可用

Hmaster是Activestandby模式

 

 

 

 高可用配置

 

 

 

扫描查看数据

Guess you like

Origin www.cnblogs.com/bainianminguo/p/12110077.html