What Started Hadoop --- Hbase that?

What 1.Hbase that?

Hbase is a form of NoSql. NoSql database is divided into the following categories:

  • Key-Value type database

    Such a database will be used primarily to a hash table, the table has a specific key and a pointer to the specific data. key / value model for IT systems simple and easy to deploy. But for some time DBA value only queries and updates, key / value becomes inefficient up. Examples such as: Tokyo Cabinet, Redis, Voldemort, Oracle BDB.

  • Column-oriented database

    This part of the database used to deal with massive data distributed storage. Key still exists, but they are characterized by pointing to multiple columns. These columns arranged by the column family. Such as: Cassandra, HBase, Riak.

  • Document database

    Document database from Lotus Notes office software, and it is similar with the first key-value store. This type of data is a version of the document, the document is stored and a specific semi-structured format such as JSON. Document database can be seen as the key upgraded version of the database, allowing nested values ​​between. And document database query efficiency higher than the key database. Such as:. CouchDB, MongoDb There are also a document database SequoiaDB, it has been open source.

  • Graphics Database

    Database ranks with other graphical structure and the structure of the SQL databases, it uses a flexible graphical models, and can be extended to multiple servers. NoSQL database is not a standard query language (SQL), database query and therefore need to customize the data model. Many NoSQL databases are REST-style interface or data query API. Such as: Neo4J, InfoGrid, Infinite Graph.

2.Hbase architecture

Hbase depends on MapReduce and HDFS, as shown below:

From the figure can be seen a plurality Store Region composed of a Store group corresponds to a column (Column Family). Including a store located on disk and in memory of the storefile memstore. When writing data will first be written memstore, when more than a certain threshold value, will be written storefile, when storefile reach a certain number, it will conduct a revision deleted and merge to form larger storefile. When all storefile region exceeds a certain threshold value, it will be split into two region, assigned to the corresponding region from the HRegionMaster Server servers, load balancing.

When looking at the data will first memstore find, can not find find filestore.

HRegion is the minimum unit of load balancing, it can be assigned to different HRegion different HRegion Server.

StoreFile stored on HDFS to HFile format.

HBase is the most important design RowKey, RowKey uniquely identifies a column.

Insert Column to specify the column family time and table names, column names is not important, it is accompanied insert the data.

timestamp is used to identify our version.

mapreduce responsible for highly available, storage region of the API.

Series Portal

Guess you like

Origin www.cnblogs.com/shun7man/p/11880396.html