Big Data database HBase (a) - Architecture Principle

A, HBase Profile

1.1.Hadoop ecosystem

 

 

 

1.2. Non-relational database knowledge expansion 

  • Cassandra hbase mongodb 
  • Couchdb, file storage database 
  • Neo4j non-relational database diagram

Initial 1.3.Hbase

  • Hadoop Database, is a high-reliability, high performance, column-oriented, scalable, distributed real-time database to read and write
  • Hadoop HDFS use as a file storage system, the use of massive data Hadoop MapReduce HBase to the use thereof as a distributed collaborative service Zookeeper
  • Mainly used to store data of unstructured and semi-structured bulk (column memory NoSQL databases)

Two, HBase data model

                                                               FIG logical form FIG 2.1

A table like this, to determine a value must have RowKey, Time Stamp, column Family, column key

2.1.1.ROW KEY

- decided row of data
- sorted according to the dictionary order.
- Row key can store data bytes to 64k

2.1.2.Column Family column family & qualifier column

- each column of the table HBase column belongs to a group, the group must column (schema) is defined as a table model
Predefined portion. The create 'test', 'course';
- column column name as a prefix family, each "column group" members can have a plurality of columns (column); The course: math,
course: english, family members of the new column (columns) can subsequently needed, dynamically join;
- access control, storage and tuning are carried out in a column group level;
- HBase the same column family which data is stored in the same directory, saving several files.

2.1.3.Timestamp timestamp

- HBase each cell in the memory cell of a plurality of versions of the same data, according to the unique
To distinguish the difference between the timestamp of each version, according to different versions of the data
Between the reverse order, the latest version of the data in the front row.
- type of stamp is 64-bit integer.
- timestamp may (automatically writing data) assigned by HBase, the time stamp at this time is the essence
To determine the current system time in milliseconds.
- timestamp can also explicitly assigned by the customer, if the application you want to avoid data release
Conflict, it is necessary to generate a unique time stamp of their own.
 

2.1.4.Cell Cell

- intersection of rows and columns of the determined coordinates;
- there is a version of the cell;
- content of the cell is unresolved byte array;
▪ the {row key, column (= <family> + <qualifier>), version} uniquely determined
unit.
▪ cell data is not of the type, all stored in the form of a byte array.
 

                                                                                 Figure 2-2 Hbase Chart

2.2.1.Client

▪ contains interface to access HBase and maintain cache to speed up access to the HBase

2.2.2.Zookeeper

▪ guarantee at any time, only one active master cluster
▪ Storage Addressing the entrance of all the Region.
▪ Region server real-time monitoring of on-line and off-line information. And real-time notification Master
▪ the storage HBase table metadata and schema

2.2.3.Master

- assignment region for the Region server
- Responsible for Region server load balancing
- found that the failure of Region server and redistribute region on which
- Management of the user additions and deletions to the operation of the table

2.2.4. RegionServer

- Region server maintenance region, handling IO requests for the region's
- Region server is responsible for splitting the region from becoming too large during operation

2.2.5.Memstore 与 storefile

– 一个region由多个store组成,一个store对应一个CF(列族)
– store包括位于内存中的memstore和位于磁盘的storefile写操作先写入memstore,
当memstore中的数据达到某个阈值,hregionserver会启动flashcache进程写入
storefile,每次写入形成单独的一个storefile
– 当storefile文件的数量增长到一定阈值后,系统会进行合并(minor、major
compaction),在合并过程中会进行版本合并和删除工作(majar),形成更大
的storefile
– 当一个region所有storefile的大小和数量超过一定阈值后,会把当前的region分割
为两个,并由hmaster分配到相应的regionserver服务器,实现负载均衡
– 客户端检索数据,先在memstore找,找不到去blockcache,找不到再找storefile
 
▪ HRegion是HBase中分布式存储和负载均衡的最小单元。最小单元就表示不同的HRegion
可以分布在不同的 HRegion server上。
▪ HRegion由一个或者多个Store组成,每个store保存一个columns family。
▪ 每个Strore又由一个memStore和0至多个StoreFile组成。如图:StoreFile以HFile格式保存
在HDFS上。
 
                                                             图 2-3 Region & Store 关系图
 

                                                            图 2-4 关系图(二)

Guess you like

Origin www.cnblogs.com/littlepage/p/11273098.html