Hbase architecture principles to resolve

Hbase architecture principles to resolve

https://developer.51cto.com/art/201904/595698.htm

HBase architecture

 

 

 HBase architecture seems to master-slave architecture, and HDFS bit like, HMaster is used to manage the cluster, HRegionServer is truly a place to store data

HBase data query and write at the time, in fact, did not ask HMaster like HDFS.

In HBase, each table will have a meta-information, that information is stored as HBase table, called a meta-information table, also called Meta table, which is a system table.

Meta table is a table HBase, it refers Meta key table is stored with the Value and Rowkey

 On HRegionServer, but Meta above table is not stored in the storage distributed coordination services Zookeeper. So Meta table is in fact a fixed place to read, then according to the table knew Meta data on which HRegionServer.

In fact HMaster relatively onerous task, but more important, it is mainly to achieve load balancing HRegionServer by adjusting the distribution and management Region.

HRegionServer architecture

 Region segmentation is HBase Rowkey in each Region Rowkey range can be determined by StartKey and EndKey, there may be a plurality of Region on HRegionServer.

The data and certain Rowkey hash rule, into different Region above, while Region is one belonging to the HRegionServer.

 

 Store: a column cluster column is stored with correspondence to this, a cluster data is saved to a Store column.

MemStore: a first layer of LSM, is the data memory;

StoreFile:

HFile:

Reading and writing processes

 

 

①HBase Client write input, and start to get in Meta Zookeeper table information, which should go find RegionServer written according to Rowkey data.

② Then RegionServer HBase writes data corresponding to the memory MemStore while recording the operation log WAL.

③ When MemStore exceeds a certain threshold, the data will be flushed to the memory MemStore hard disk, formed StoreFile.

④ triggered when certain conditions, small StoreFile will merge into large StoreFile, in favor of HDFS storage.

 

HMaster effect: when a large number Rowkey similar data are assigned to a Region, resulting in the Region data is too big time, will be split Region, Region HMaster after the split will reallocate RegionServer, which is HMaster load balancing strategy.

 

 

①HBase Client data to be read, and get the start Zookeeper Meta information table, find the corresponding data based on the original search Rowkey which RegionServer.

And ② respectively lookup StoreFile MemStore clusters based on these columns RegionServer, give many key-value data structure.

③ found *** return data based on the version of the data.

 

Data Model

 

RowKey

Is used to represent a unique primary key of a row, HBase data is globally ordered lexicographically RowKey all queries can only rely on this sort dimension.

By the following example to illustrate the principle of a "dictionary sort" of:

RowKey { "abc", "a", "bdf", "cdf", "defg"} according to the result of the dictionary ordered as { "a", "abc", "bdf", "cdf", "defg"}

That is, when two RowKey sort, a first byte of the first two RowKey contrast, if the same, then compare the second byte, and so on ... If at the comparison to M bytes, beyond the byte length of one RowKey, then, to be short RowKey additional row in front of a RowKey

 

sparse matrix

Reference Bigtable, HBase data in a table is the manner sparse matrix organization, "begins with" part gives a table of data on HBase abstract diagram , we combined the following table to deepen everyone on the "sparse matrix" impression:

 

Look out: each row, column composition are flexible and do not need to follow the same column definitions between lines, that is, HBase data sheet "schema-less" features.

 

Region

Distinguished Cassandra / DynamoDB of "Hash partition" design, HBase adopted in "Range partition", the full range of cutting one of the Key, "Key the Range", each "Key the Range" is called a Region.

Also can be understood: the HBase has a large table in hundreds of millions of rows transversely into one "sub-table," All these "sub-table" is the Region:

 

Region is the basic unit in HBase load balancing, when a Region grow to a certain size in the future, will be automatically split into two.

 

Column Family

If a Region as a transverse cutting table, then, a longitudinal cut in Region data columns, called a Column Family. Each column must belong to a Column Family, attributed this relationship is specified when writing data, rather than build a pre-defined time table.

 

KeyValue

KeyValue not designed from Bigtable, but to date back to the paper "The log-structured merge-tree (LSM-Tree)". Each column of data in each row, are being packaged as separate KeyValue has a specific structure, KeyValue contains a wealth of information self-describing:

Look out, KeyValue support is a key point "sparse matrix" design: some of the same Key arbitrary number of independent KeyValue can constitute a row of data. But the obvious drawback of this design: each KeyValue carried by self-describing information, will bring significant data expansion.

 

1. HBase Introduction

1.1 What is HBase

HBASE is a high-reliability, high performance, column-oriented, scalable, distributed storage system, using technology erected HBASE mass storage cluster configuration on cheap PC Server.

HBASE goal is to store and process large data, more specifically, just use common hardware configuration, it is able to handle large data made up of thousands of rows and columns thereof.

HBASE is the open source implementation of Google Bigtable, but there are also many differences. For example: Google Bigtable the GFS as a file storage system, HBASE using Hadoop HDFS as a file storage system; the Google operating MAPREDUCE to massive data Bigtable in, HBASE similarly using Hadoop MapReduce to process massive data HBASE in; the Google Bigtable use Chubby as a collaborative service, HBASE use Zookeeper as a collaboration service.

1.2 compared with the traditional database

1, the traditional problems encountered by the database:

  1) when a large amount of data can not be stored;
  2) do not have good backup mechanism;
  3) the data reaches a certain number began to slow, a lot of words simply can not support;

2, HBASE advantages:

  1) linear expansion, with the increase in the amount of data can be expanded by the support node;
  2) data stored on HDFS, sound backup mechanism;
  3) coordinate data ZooKeeper find, access speed.

1.3 HBase cluster roles

1, one or more master nodes, HMASTER;

2, a plurality of slave nodes, HregionServer;

3, HBase dependency, zookeeper;

2. HBase data model

 

 

2.1 HBase storage mechanism

HBase is a column-oriented database, it sorts the rows in the table. Table schema definition only column family, which is the key value pairs. A table and a plurality of column families each column group can have any number of columns . Subsequent columns values continuously stored on disk. Cell values in each table has a time stamp. In short, in a HBase:

  • Table is a set of rows.
  • Line is a collection of columns family.
  • Column family is a collection of columns.
  • Column is a collection of key-value pairs.

Here column or column-oriented storage, in fact, that the memory is a column group, HBase data to store based on column group. The following column family can have a lot of columns, column family must be specified when creating the table.

HBase and compare the RDBMS

 

 

 

RDBMS tables:

HBase table:

2.2 Row Key row of keys

Like nosql database, row key is used to represent the only record one line of the primary key , when RowKey HBase data in accordance with the dictionary order globally sort, all queries can only rely on this sort dimension. Access HBASE table in a row, only three ways:

1. accessed through a single row key;

2. The range row key (canonical)

3. The full table scan

Row key key row (Row key) can be any string (maximum length 64KB, practical length typically 10-1000bytes), inside HBASE, row key stored as the byte array. When stored, the data is stored ordered according Row key lexicographic order (byte order). When designing key, to fully sort store this feature, store the rows often read together put together. (Location relevance)

2.3 Columns  Family 列族

Column cluster: HBASE each column in the table, the column belongs to a family. Column group is part of a table schema (rather than columns), the table must be defined before use. Column name as a prefix to the column family. For example courses: history, courses: math courses belong to this column family

2.4 Cell

From the {row key, columnFamily, version} unit is uniquely determined. Data cell type is not, the entire bytecode stored.

Keywords: untyped, bytecode

2.5 Time Stamp time stamp

HBASE determined by rowkey and columns of a memory cell called a cell. Each cell holds are multiple versions of the same data. Version indexed by time stamp. Type of stamp is 64-bit integer. Timestamp may (automatically writing data) assigned by HBASE, this time stamp is accurate to the current system time in milliseconds. It can also display a time stamp assigned by the customer. If the application data to avoid version conflicts, it must generate their own unique stamp. Each cell, different versions of the data in reverse chronological order, ie the latest data at the top.

In order to avoid the presence of too many versions of data management caused (including storage and indexing) burden, HBASE offers two versions of data recovery systems. First, save the data of the last n versions, but stored in the most recent version (for example, the last seven days). Users can be set for each column family.

3. HBase principle

 

 

Components are described:

Client:

Use HBase RPC mechanism HMaster and HRegionServer communication
Client class management operations HMaster
operation Client class reading and writing data with HRegionServer

Zookeeper:

Zookeeper Quorum storage -ROOT- table address, HMaster address
HRegionServer himself to Ephedral Register to in Zookeeper, HMaster ready perception of the health of the individual HRegionServer
Zookeeper HMaster avoid a single point of issue

The main role of Zookeeper: Contact ZooKeeper client first sub-cluster (quorum) (a separate cluster nodes of a ZooKeeper) Find Xingjian. The above process is obtained by containing -ROOT- the region ZooKeeper server name (host name) to complete. By containing region can query the server to contain .META -ROOT-. Table region corresponding to the name server, wherein the request information comprises revolves. These two main contents are cached down, and only the query again. Eventually, the data for a server name Xingjian region where the client queries the server by querying .META. Once you know the actual location of the data, that region's position, HBase caches this information queries, as well as direct contact management HRegionServer actual data. So, after the client can locate a desired position by caching data information well, rather than look .META. Table again.

HMaster:

HMaster no single point of question, HBase can start multiple HMaster, Zookeeper by Master Election mechanism to ensure there is always a Master running
mainly responsible for the Table and Region of management:
1. Management user additions and deletions to change search operation table
2. Management HRegionServer load balancing, distributed adjusted Region
after 3. Region Split, responsible for the distribution of the new Region
4. after HRegionServer shutdown, responsible for the failure to migrate Region on HRegionServer

HRegionServer:

HBase in the core module, is responsible for responding to user I / O requests, read and write to the file system HDFS

 

 

 

HRegionServer series HRegion management objects;
each HRegion corresponding to a Region Table, HRegion composed of a plurality HStore;
each HStore corresponds to a Table in Column Family storage;
Column Family is a centralized storage unit, it will have the same characteristics IO the Column in a Column Family will be more efficient.

You can see data on the client does not need to master access hbase participation (addressing access zookeeper and region server, data read and write access region server), master table and only maintenance of metadata information region (table metadata information stored in the zookeeper), the load is low. When a sub-table HRegionServer access, creates a HRegion objects, and then create an instance of Store family table for each column, and each will have a MemStore Store and zero or more StoreFile corresponding thereto, will correspond to each StoreFile a HFile, HFile is the actual storage files. Therefore, a HRegion (table) how many columns there are that many families Store. A HRegionServer have multiple HRegion and a HLog.

HRegion:

Partition table row direction of the plurality Region. HBase Region is the smallest unit in the distributed storage and load balancing, i.e. different in each region can be different Region Server, but the same is not split Region on multiple server.

Region separated by size, each table is generally only one region. As data continues into the table, increasing region, when the region reaches a threshold a column group (default 256M) will be split into two new region.

Each region identified by the following information:

  1. <Table name, startRowKey, creation time>
  2. From the table of contents (-ROOT- and .META.) Of the recording region of endRowKey

HRegion Positioning: Region is assigned to which RegionServer is completely dynamic, so it is necessary to locate Region specific mechanism in which region server.

HBase using a three-layer structure to locate the region:

  1. By zookeeper in the file / hbase / rs get position -ROOT- table. -ROOT- table has only one region.
  2. Find .META by -ROOT- table. Table in a first position corresponding to the region. In fact -ROOT- .META table is a region first table;.. .META table for each row in the region are -ROOT- table.
  3. Table to find the location of the user table region by .META. Each user table row in the region is .META table.

note:

 -ROOT- table will never be divided into multiple region, to ensure that the most needed three jumps, you can navigate to any region. Client will query the location information is cached, the cache will not take the initiative to fail, so if the cache on the client all the failures, the need for network back and forth six times to navigate to the correct region, three of which were used to discover a cache miss, the other three used to obtain location information.

The relationship between the table and the region

The default table initially only a region, with the increasing number of records becomes larger, the initial region will gradually split into multiple region, there is a region [startKey, endKey] said different region will be assigned to the corresponding master regionserver management.

is the minimum unit region hbase distributed storage and load balancing, different sub-region not different regionServer.

Note: Although the region is the smallest unit of distributed storage, but not the smallest unit of storage. region is composed by one or more of store, each store is a column family. Each store has a memStore and 1 to store file multiple composition (memstore to a threshold will be refreshed, write to storefile, there hlog to ensure the security of data, and only a regionServer a hlog)                                     

HStore:

The core HBase storage. By the MemStore and StoreFile composition. MemStore is Stored Memory Buffer.
HLog:

Introducing HLog reason: In a distributed system environment, system error can not be avoided or down, once HRegionServer unexpectedly quit, MemStore in-memory data is lost, the introduction of HLog is to prevent this situation.

Working mechanism:
each will have a HRegionServer in HLog objects, HLog Write Ahead Log is a class that implements the same time every time users write MemStore operation, will write a data file to HLog, HLog file regularly rolling out new and delete the old file (already persisted to the data StoreFile). When HRegionServer terminated unexpectedly, will pass HMaster Zookeeper perception, HMaster processed first HLog legacy file, the log data to a different region of the split, respectively the corresponding region into the directory, and then redistribute failure region, the region to receive Load Region of HRegionServer in the process, you will find history HLog need to be addressed, therefore Replay data HLog in the MemStore then flush to StoreFiles, complete data recovery.

3.1 HBase storage format

All data files are stored in HBase on Hadoop HDFS file system, there are two main formats:

1. HFile, HBase storage format in the Key-Value data, HFile Hadoop is a binary file format, in fact StoreFile is to HFile made of lightweight packaging that StoreFile bottom is HFile.

2. HLog File, HBase in WAL (Write Ahead Log) storage format, physically Hadoop of Sequence File

HFile

Picture explanation:

HFile variable length file, only two fixed length blocks: Trailer and FileInfo

Trailer pointer to the starting point of the other data blocks

File Info Meta information is recorded in a number of documents, such as: AVG_KEY_LEN, AVG_VALUE_LEN, LAST_KEY, COMPARATOR, MAX_SEQ_ID_KEY etc.

Meta Data Index and Index Data blocks are recorded for each block and the block starting Meta

Data Block is the basic unit HBase I / O in order to improve efficiency, HRegionServer there are mechanisms based on the LRU Block Cache

Each Data block size can be specified by the parameters in the creation of a Table when large order in favor of Block Scan, trumpet Block conducive to random queries 

Data of each block is in addition to the one at the beginning of Magic KeyValue for splicing, Magic content is some random numbers, in order to prevent data corruption

HFile inside each KeyValue a simple byte array to that. The byte array which contains a number of entries, and a fixed structure.

KeyLength and ValueLength: two fixed length, representing the length of the Value and Key 

Key Part: Row Length is a fixed-length value representing the length of RowKey, Row is RowKey 

Column Family Length is a fixed-length value representing the length of the Family 

Followed by Column Family, then followed Qualifier, then the values ​​of the two fixed length, and Time Stamp represents Key Type (Put / Delete) 

Value is not part of such a complex structure, is pure binary data

HLog File

HLog file is an ordinary Hadoop Sequence File, Sequence File Key is HLogKey of the object, HLogKey home information recorded in writing data, in addition to the table name and region, but also including sequence number and timestamp, timestamp is "write time ", sequence number starting value is 0, or is stored in the file system in the last sequence number. 

HLog Sequece File is the Value of HBase KeyValue objects, i.e. corresponding to the KeyValue HFile

3.2 write process

1) Client Zookeeper scheduling by issuing a request to write data RegionServer, the write data in the Region;

2) Data is written to the Region of MemStore, know MemStore reaches a preset threshold (ie MemStore full);

Data 3) MemStore Flush to be in a StoreFile;

4) With the growing number StoreFile documents, when their number grows to a certain threshold, triggering Compact merge operation, multiple StoreFile combined into a StoreFile, simultaneous version mergers and delete data;

5) StoreFiles through continuous combined operation Compact, gradually increasing StoreFile;

6) single StoreFile size over a certain threshold, triggering Split operation, the current Region Split into two new Region. Parent Region will offline, out of the new Split sub Region 2 is assigned to the corresponding HMaster RegionServer, so that the original pressure of Region 1 is split into two Region.

HBase data can be seen only add, update, and delete all the operations are held in the course of the follow-up Compact, enabling users to write into the memory as long as you can return immediately to achieve HBase I / O performance.

3.3 Reading Process

1) Client Access Zookeeper, look -ROOT- table, get . META . Table information;

2) from the . META . Table lookup, access to stored data Region target information to find the corresponding RegionServer;

3) obtain the data needed to find by RegionServer;

4) RegionServer memstore and the memory is divided into two portions BlockCache, MemStore mainly for writing data, BlockCache mainly for reading data. First MemStore read request to search the data, finding out the check on the BlockCache, then it will not find the StoreFile to read, and the read result into BlockCache.

Addressing procedure: client-> Zookeeper-> ROOT table -> . The META .  TABLE -> RegionServer-> Region-> client

 

Guess you like

Origin www.cnblogs.com/a1714419456/p/12571627.html