Getting Started Learning Series [Hadoop HBase] Six basic architecture, programming model and application case

Reprinted: https://blog.csdn.net/shengmingqijiquan/article/details/52922009

HBase HDFS is built on a distributed storage system column;

HBase Apache Hadoop ecosystem is an important one, is mainly used for mass data storage structure;

Logically, the data is stored in accordance with HBase tables, rows and columns.

Hbase is an integral part of the Hadoop ecosystem


Hbase contrast with HDFS

Thing in common:
both have good fault tolerance and scalability, it can be extended to hundreds of nodes;
difference:
HDFS:
suitable for batch scene
does not support data randomly find
unsuitable for processing incremental data
does not support data update
HBase:
large: a table can be billions of rows on one million;
no mode: each line has a sort of master key and any number of columns, the column can be increased dynamically as needed, in the same table rows can be different there are distinct columns;
column oriented: the column (group) for storage and access control, the column (group) object independent
search;
sparse: empty for the column (null), and does not take up storage space, the table can be designed
in very sparse ;
data multi-version: the data in each cell may have multiple versions, by default
automatically assigned the version number, the time stamp is inserted into the cell;
single data type: string data is in Hbase, no type
line storage and column store


Two .HBase model and basic infrastructure
2.1 HBase data model
HBase is based on Google BigTable model development, the typical key / value systems;

Logical view Hbase

Rowkey与Column Family

Hbase supported operations
all operations are based rowkey; and
support CRUD (Create, Read, Update and Delete) and Scan;
single-line operation
 of Put
 the Get
 Scan
multi-row operation
 Scan
 MultiPut
no built-in join operations, may be used MapReduce solution .

2.2 HBase physical model
in a separate file is stored for each column family on the HDFS;
Key and by a Version number in each column family;
null value is not saved.
All the rows are arranged in order in the Table row key dictionary; Region is divided into a plurality of row direction Table;

Region divided by the size of each table is only the beginning of a region, as data increase, region is increasing, when increases to a threshold time, region and other clubs will be two new region, then there will be more and the more the region;

Region HBase is stored and distributed load balancing minimum unit. Different Region distributed to different RegionServer;

Although the Region is the smallest unit of distributed storage, but not the smallest unit of storage. Store Region by one or more, each store a storage columns family; Strore each turn consists of a plurality of memStore and 0 to StoreFile composition; in memory, stored on memStore StoreFile storage HDFS.


2.3 The basic architecture


2.3.1Hbase basic components
Client
 contains interface to access HBase and maintains cache to speed up access to HBase
Zookeeper
 guarantee at any time, only one cluster Master
 memory addressing all Region entry
on-line real-time monitoring  Region server and offline information. And real-time notification to the Master
 stored in HBase schema and table metadata
Master
 distribution region is Region server
 responsible for Region server load balancing
 found failures Region server and redistribute region on which
 Manage User CRUD on the table of operating
Region Server
Region server maintenance region, the treatment of these region IO requests
Region server is responsible for segmentation becomes too large during operation region
2.3.2 Zookeeper role
HBase rely ZooKeeper
default, HBase management ZooKeeper instance, such as , start or stop ZooKeeper
will ZooKeeper registration when the Master and RegionServers promoter
that is introduced Zookeeper Master no longer a single point of failure

2.4HBase fault tolerance
Master Fault Tolerance: Zookeeper reselect a new Master
 no Master during data reading is still usual;
 no master process, region segmentation, load balancing can not be performed;
RegionServer fault tolerant: the timing to report Zookeeper heartbeat, if the heartbeat does not appear once in the time
Master redistribution of the Region on RegionServer to other RegionServer;
 failure on server "pre-write" log is divided by the main server and sent to the new
RegionServer
Zookeeper fault tolerance: Zookeeper is a reliable service
 general configuration Zookeeper instance 3 or 5.
Three .HBase Application examples
when 3.1 HBase?
Need random data read or random write operations;
the large high concurrent data operations, such as PB-level data per second to thousands of operations;
read and write access are very simple operation.
3.2 HBase enterprise applications

 

Four .HBase programming real
4.1 Hbase access method
Native Java API
most common and efficient way to access ;
HBase Shell
 HBase command-line tool, the easiest interface, suitable for use HBase management;
Thrift Gateway
 Thrift use serialization technology, support C ++, PHP, Python and other languages, other heterogeneous systems for online access to HBase table data;
REST Gateway
 support REST-style Http API access HBase, the lifting of language restrictions;
MapReduce
 directly MapReduce job processing Hbase data;
 use Pig / hive data processing Hbase
4.2 Hbase Java programming
4.2.1 Hbase Java API Overview
Hbase is written in Java, and Java programming support is a natural thing;
support CRUD operations;
Create, the Read, Update, the Delete
Ø Java API contains Hbase shell supports all the features even more;
the Java API is the fastest way to access Hbase.
4.2.2 Java API programming steps
Step 1: Create a Configuration object, includes various configuration information
Configuration HbaseConfiguration.create the conf = ();
1
Step 2: Constructing a htable handle
 provide Configuration object
 provide the name to be accessed Table of
htable Table = new new htable (the conf, tableName);
. 1
Step 3: performing a corresponding operation
 performed put, get, delete, scan and other operations
table. the getTableName ();
. 1
step 4: Close HTable handle
 data refresh memory to disk
 release resources
table.close ();
. 1
example:


Writing data to HBase 4.2.3
Step 1: Create a Put the object;

= New new PUT PUT PUT (Bytes.toBytes ( "RowKey"));
. 1
Step 2: Set cell value;

Put.add (Family, column, value)
Put.add (Family, column, timestamp, value)
Put.add (the KeyValue kV)
. 1
2
3
Step 3: call put HTable method of writing data;
Step 4: Close HTable handle.


4.2.4 Hbase read data from the
supported API types
 row of data acquired by rowkey
 acquired by a set of a plurality of recording rowkey
 scan the entire table or a part table
scan table
 can specify scan range [StartKey EndKey)
 table data is sorted according rowkey
API characteristics of
a limited number , easy to use

Considerations when reading data:
 need only read data
 possible to increase the data constraints
 increase family, column (s), time range , and constraints such as max versions

Interface instance
 get.setTimeRange (minStamp, maxStamp)
 get.setMaxVersions (maxVersions)
 get.addFamily (Family)
 get.addColumn (Family, column)
4.2.5 Remove data from the Hbase


4.2.6 From the scan data Hbase


V. summary of
this blog is to record the basics of HBase, HBase master the core design ideas and java programming, figuring out when to use HBase! Three cases: the need for random access, high concurrent operation of large quantities of data, the operation is simple read-write access!
----------------
Disclaimer: This article is CSDN blogger "Data Circle" of the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and this link statement.
Original link: https: //blog.csdn.net/shengmingqijiquan/article/details/52922009

Guess you like

Origin www.cnblogs.com/ceshi2016/p/12122941.html
Recommended