Fast learning -HBase Profile

Introduction Chapter 1 HBase

1.1 What is HBase

HBase is the prototype of Google's BigTable paper, inspired by the paper thought, now as Hadoop subprojects to develop maintenance, support for structured data storage.
Official Website: HTTP: //hbase.apache.org
- published in 2006 Google BigTable White Paper
- 2006 began developing HBase
- the success of the 2008 Beijing Olympic Games opened, the programmer will quietly ended up HBase Hadoop subprojects
- 2010 HBase become a top-level Apache project
- now many companies developed a lot of secondary releases, you have begun to use.
HBase is a high-reliability, high performance, column-oriented, scalable, distributed storage system, using technology erected HBASE mass storage cluster configuration on cheap PC Server.
HBase goal is to store and process large data, more specifically, just use common hardware configuration, it is able to handle large data made up of thousands of rows and columns thereof.
HBase is an open source implementation of Google Bigtable, but there are also many differences. For example: Google Bigtable using GFS as a file storage system, HBase using Hadoop HDFS as a file storage system; the Google operating MAPREDUCE to massive data Bigtable in, HBase similarly using Hadoop MapReduce to process massive data HBase in; the Google Bigtable use Chubby as collaboration services, HBase use as Zookeeper correspond.

1.2 HBase Features

1) Mass Storage
Hbase PB level adapted to store huge amounts of data, and can return the data in the tens to hundreds of milliseconds at the level of the data PB and PC stored in the case of inexpensive. This is closely related to the highly scalable Hbase. Because formal Hbase good scalability, it provides a convenient mass data storage.
2) Column storage
where the storage column is a column in fact, that a storage group, Hbase according to data stored in a column group. The following column family can have a lot of columns, column family must be specified when creating the table.
3) extended easily
scalable Hbase mainly in two aspects, is based on a processing capacity of the upper extension (RegionServer), one is based on the extended storage (HDFS).
By adding lateral RegionSever machines, horizontal expansion enhance Hbase upper processing capabilities, enhance the ability to serve more Hbsae the Region.
Note: the role is to manage RegionServer Region, following a service access, the detailed description will be later added through the lateral Datanode machine, for expansion of the storage layer, to enhance the data storage capacity and enhance literacy Hbase back-end storage.
4) high concurrency
because the use of low-cost PC currently most Hbase architecture, are used, so a single IO latency actually is not small, generally between tens to hundreds of ms. Here that high concurrency, mainly in the case of concurrent IO Hbase single drop is not much of a delay. To achieve high concurrency, low latency service.
5) sparse
sparse mainly for flexibility Hbase column, the column family, you can specify any number of columns, column data in the case of empty, will not take up storage space.

1.3 HBase architecture

Here Insert Picture Description
As can be seen from the figure is comprised of several components Hbase Client, Zookeeper, Master, HRegionServer, HDFS etc., to introduce the following correlation functions of several components:
. 1) Client
Client interface to access Hbase contains, in addition also maintains Client the corresponding cache to speed access Hbase, such as the cache .META. metadata information.
2) Zookeeper
HBase Zookeeper done by the master availability, RegionServer monitoring, configuration and metadata entry cluster maintenance work. Details are as follows:
by Zoopkeeper to ensure that the cluster is only one master in operation, if master abnormal, will have a new master through a competitive mechanism to provide services
to monitor the state RegionServer by Zoopkeeper, when RegionSevrer abnormal by the callback form information on the offline notification Master RegionServer
unified entry address data stored by Zoopkeeper yuan
3) Hmaster
main responsibilities master node is as follows:
for the allocation Region RegionServer
maintain load balancing across a cluster of
maintaining metadata information cluster
discovery failed Region, and will fail It is assigned to a normal Region RegionServer
when RegionSever failure, the corresponding coordinate Hlog split
. 4) HregionServer
HregionServer user interface directly read and write requests, a real "work" node. Its features are summarized below:
Region manager assigns the master
write process request from a client
interaction is responsible for storing the data and the underlying HDFS to HDFS
responsible split Region increases after
the merger of Storefile responsible
. 5) HDFS
HDFS provide the final data is stored as the underlying Hbase service, while providing support for high availability (stored in the Hlog HDFS) is HBase, specific functions are summarized as follows:
providing metadata and the underlying table data storing service data distributed multiple copies to ensure high reliability and high availability

The 1.3 HBase role

1.3.1 HMaster

Function
1. Monitoring RegionServer
2. RegionServer failover process
3. Change processing metadata
4. Distribution or transfer processing region
5. Load data in the idle time equalizer
6. Zookeeper by publishing their location to the client

1.3.2 RegionServer

Function
1. Responsible for the actual data stored in HBase
2. Assigned to handle its Region
3. Flush the cache to HDFS
4. Maintain the Hlog
5. Performs compression
6. Region handles fragmentation

1.2.3 Other components

  1. Ahead-logs the Write
    HBase change log, when HBase for reading and writing data, data is not directly written to disk, it will be retained for a period of time (the time and the data amount threshold may be set) in the memory. But the probability of the data stored in the memory may have higher cause loss of data, in order to solve this problem, the data is first written in a file called Write-Ahead logfile, and then re-write memory. So when the system fails, the data can be reconstructed by the log file.
  2. Region
    Hbase fragment table, the table may be segmented into HBase RowKey The value stored in the different region RegionServer, in one RegionServer can have a plurality of different region.
  3. Store
    HFile stored in Store in a column group corresponding to a Store HBase table.
  4. MemStore
    name suggests, is stored in memory, in memory, to save the current operation of the data, so when data is stored in the WAL, RegsionServer key-value pairs stored in memory.
  5. HFile
    This is the actual physical preservation of the original data file on disk, the file is actually stored. StoreFile Hfile is stored in the form of HDFS.
Released 1384 original articles · won praise 1165 · Views 110,000 +

Guess you like

Origin blog.csdn.net/weixin_42528266/article/details/104353398