Hbase underlying principle

 

1, the system architecture

Client

    • 1 contains access hbase interface, client maintains some of the cache to speed up access to hbase, such as location information of regione.

    • Zookeeper

    • 1 to ensure that any time there is only one master cluster

    • All memory addressing Region 2 inlet

    • 3 real-time status monitoring Region Server, the information will be on-line and off-line real-time notification to Master Region server

    • Hbase 4 stores the schema, which comprises a table, which table each column family

Master Responsibilities

    • 1 Region server assigned region

    • 2 responsible region server load balancing

    • 3 found that the failure of the region server and re-allocate their region on

    • Garbage collection on a 4 HDFS file

    • 5 the update request processing schema

Region Server role

    • 1 Region server maintenance Master assigned to its region, the handling of these region IO requests

    • 2 Region server is responsible for splitting the region from becoming too large during operation

Can be seen, the process hbase client to access data on the master does not need to participate (addressing and accessing zookeeper Server region, data read and write access regione server), master table and only the maintainer of the metadata information region, the load is low.

2, the overall structure ( physical storage )

    • 1 Table all rows are arranged in order in row key dictionary;

    • 2 Table row direction is divided into a plurality of HRegion;

    • 3 region (1OG default), only one of each table Region started, with the continuous data into a table, region growing, when the time is increased to a threshold, and the like will HRegion two new branch divided by size the Hregion. When the table rows in growing, there will be more and more Hregion;

    • 4 Hregion Hbase is the minimum unit in the distributed storage and load balancing. It means a minimum unit may be distributed in different Hregion different HRegion server. But Hregion is not split into a plurality of server;

    • Although the minimum unit 5 HRegion load balancing, but not the smallest physical storage unit.

      In fact, by one or more HRegion Store, each store a stored column family. Each Strore turn consists of a plurality of memStore and 0 to StoreFile composition.

(1) STORE FILE & HFILE structure

StoreFile stored on HDFS to HFile format.

(2)Memstore与storefile

A region of a plurality of store, each store comprises a column family Store all the data memory comprises a hard disk located memstore StoreFile;

When data retrieval end customers, first in memstore find, I can not find find storefile.

(3) HLog (Wal wheel)

Each Region Server maintains a Hlog, rather than a per Region. Such different logs region (different from a table) will be mixed together, the aim is to continue adding a single file with respect to a plurality of simultaneously writing files, the disk access times can be reduced, it is possible to improve the write performance of the table.

Server offline if a region, in order to restore the region which needs to log on to the Server split region, and then distributed to other recovery region server. HLog file is an ordinary Hadoop Sequence File.

3, reading and writing process

(1) read requesting process

1, HRegionServer holds meta tables and table data, table data to be accessed, Client go first visit zookeeper, obtain location information from the meta table where the zookeeper inside that holds find this meta table on which HRegionServer.

2, followed by the Client to access just acquired IP HRegionServer Meta HRegionServer the table is located, thereby reading the Meta, and then obtain the meta data Meta is stored in the table.

3, Client by the data stored in the information element, to access the corresponding HRegionServer, then scan the Memstore where HRegionServer Storefile and to query the data.

4, the last HRegionServer to query the data in response to the Client.

(2) write request process

1, Client is the first visit zookeeper, find Meta table and get Meta table metadata. Determining a current to be written and the data corresponds HRegion HRegionServer server.

2, Client to server initiated HRegionServer write request, and in response to receiving the request and HRegionServer.

3, data is first written to the Client HLog, to prevent data loss, and write data to Memstore.

4, if the write was successful HLog and Memstore, then this data is successfully written, if Memstore reaches a threshold, the data will Memstore to flush in the Storefile. As more and more Storefile, will trigger Compact merge operation, the excess Storefile merge into one big Storefile. When Storefile growing, Region will be growing, after the threshold is reached, triggers Split operation, the Region into two.

(3) Details

1, data is first written when updating Log (WAL log) and memory (MemStore), the data is sorted MemStore, when MemStore accumulated to a certain threshold, it will create a new MemStore, and the old MemStore Add to flush the queue, flush by a separate thread to disk, to become a StoreFile. At the same time, the system will record a redo point in the zookeeper, represents the change before this time has persisted up.

2, when the system is unexpected, it may result in data memory (MemStore) is lost, this time using Log (WAL log) to recover data after checkpoint.

3, StoreFile is read-only, after once created can no longer be modified. So in fact, it is constantly updated Hbase additional operations. When a Store in StoreFile reaches a certain threshold, it will perform a merge (minor_compact, major_compact), will be merged with modification a key to together to form a large StoreFile, when the size of StoreFile reaches a predetermined threshold value, and StoreFile will be split, divided into two StoreFile.

4, because the table is updated constantly added, while compact, need access to all of StoreFile and MemStore in Store, they will be merged by row key, because the StoreFile and MemStore are sorted and indexed memory with StoreFile the combined process is relatively fast.

4, Region management

(1) region assigned

Any time, a region can only be assigned to a region server. master records which are currently available region server. And a region which is currently assigned to which region server, which region has not been assigned. When a new region to be assigned, and there is space available on a time region Server, this region give Master Server sends a mount request, the region assigned to the region server. After the region server to get the request, they begin to provide services to this region.

(2) region server上线

master zookeeper use to track the region server status.

When a region server starts, first established on behalf of their znode in the server directory on the zookeeper. As the master subscribed to change the message on the server directory, when the file server directory appears to add or delete, master can get real-time notification from the zookeeper. Therefore, once the on-line server region, master can immediately get the message.

(3) region server offline

When the region server offline, it disconnected session zookeeper, zookeeper and on behalf of this server is automatically released an exclusive lock on the file. master can determine, master data will be deleted znode on behalf of this region server under the server directory, and assign this region the region to the other server is still alive comrades.

(4) Master mechanism

master on the line

1, the only acquire a lock on behalf of the master from the Active ZooKeeper, for preventing other master becomes master.

2, the scan server ZooKeeper parent node, server to obtain a list of currently available region.

3, each region Server communication, a corresponding relationship of the region and the region Server currently allocated.

4, the scan .META.region collection, calculated not currently assigned region, they will be placed in the list of allocation region.

master offline

1, due to the master table and only maintain metadata region, and not involved in the process table data IO, master offline only lead to a revision of all metadata is frozen (can not create deleted table, the table can not modify the schema, the region can not be load balancing, off the assembly line can not handle the region, can not be combined region, the only exception is the region of the split can be normal, because only region server involved), read and write data table can also be normal. So master off the assembly line in a short time has no effect on the entire hbase cluster.

5, HBase three important mechanism

(1) flush mechanism

1, (hbase.regionserver.global.memstore.size) default; Global 40% regionServer heap size of memstore size than the size of the operation will trigger flush to disk;

2, (hbase.hregion.memstore.flush.size) Default: 128M, a single region in memstore cache size, more than the entire HRegion will flush;

3, (hbase.regionserver.optionalcacheflushinterval) Default: 1h, the longest memory file before automatic refresh can survive

(2) Compact mechanism

The small storeFile files are merged into large Storefile files, clean up stale data, including deleted data, the version number of the saved data is three.

(3) split mechanism

Region when the threshold is reached, will excessive Region into two, a HFile reach 10Gb default time will be segmented.

5, the coprocessor

To HBase0.92 version, for example, it provides three interfaces observer:

● RegionObserver: providing client data manipulation event hook: Get, Put, Delete, Scan and so on.

● WALObserver: WAL-related operations provide hooks.

● MasterObserver: providing DDL- hook type of operation. Such as create, delete, modify data sheets.

6, HBase among the secondary index

Because HBase query is relatively weak, if required to achieve similar select name, salary, count (1), max (salary) from the statistical needs of the complexity of such a user group by name, salary order by salary and so on, basically impossible, or more difficult, so when we use the HBase, usually by means of secondary indexes program to achieve;

HBase is an index rowkey, we can only be retrieved by rowkey. If we do some combination of inquiry relative column column column hbase inside the family, we need to use two indexing scheme HBase to multi-query conditions.

Common secondary index we generally can make use of a variety of other ways, such as solr or ES and so on.

Guess you like

Origin www.cnblogs.com/FrankWongWong/p/11420710.html