Hbase entry (a) - acquaintance Hbase

file

This article introduces the basic concepts of knowledge and Big Data Hbase as big data system, an important one, Hbase can only make up for lack of off-line batch Hadoop, support storing small files, random retrieval. And this feature makes Hbase event storage for real-time computing systems have a natural good support. This makes Hbase calculated in real-time streaming is also actor an important role.

file

1, Big Data and Hbase

Big Data has developed rapidly in recent years, and real-time computing is an important trend. Whether the data is log data in the enterprise, or sensors, smart devices, etc. produced countless.

These data structures of only a small part of data, mostly unstructured data. This time, such as images and video can not be easily stored in a relational database, and large data can be various types of data can be processed.

file

But relational database There are several drawbacks:

Unable to cope with high concurrency test, there is no way to scale, the impact on performance transactional consistency.

And Nosql database, which is an abbreviation for Not Only Sql. Scalability, concurrency good performance, flexible data model.

Hbase, which is Hadoop Database is a highly reliable, high performance, scalable, distributed database. Hbase reference to Google's BigTable modeling, using HDFS as the underlying storage. Zookeeper use as a collaborative service components.

Hbase written in Java, is also a NoSQL database, these characteristics determine the unique Hbase scenarios.

2, the concept of characteristics

HBASE is a real-time database ---- can provide random read and write data

HBASE and mysql, oralce, db2, sqlserver different from other relational databases, it is a NoSQL databases (non-relational databases)

Hbase table model and a relational database table model different:

Hbase no fixed table field definitions;

Hbase table stored in each row are some key-value pairs;

Hbase table for partitioning a group of columns, the user can specify which column family which kv inserted;

Hbase tables on the physical storage, is divided according to the column group, the group of different columns of data must be stored in different files;

Each row in the table Hbase a fixed row of keys, each row and the row of keys in the table can not be repeated;

Hbase the data, the row containing key, comprising a key, comprising a value, are byte [] type, is not responsible for the user to maintain HBase data type;

HBASE poor support for transactions;

HBASE nosql compared to other databases (mongodb, redis, cassendra, hazelcast) features:

Hbase table data is stored in the file system HDFS

Thus, hbase comprising the following features: storage capacity scales linearly; high safety and reliability of data storage!

3, the core module

file

Client Client

Hbase entire inlet, through the user operating the client Hbase. HMaster and communicate with RegionServer by the client. Management operations HMaster communication, reading and writing operations RegionServer communication class.

Coordination of services zookeeper

Hbase zookeeper in charge of managing multiple HMaster election, state synchronization between servers.

The master node HMaster

HMaster can start more, to ensure that there is always a zookeeper by the normal operation, as another alternative.

Table HMaster responsible for the management and Region.

Node HRegionServer

HRegionServer mainly responsible for the user in response to IO requests, read and write data to the HDFS. HRegionServer management HRegion a series of objects. HRegion corresponds in a Region Table. HRegion composed of a plurality HStore, HStore for use in a Table Column Family.

And each has a HRegionServer HLog target for data recovery.

4, usage scenarios

search engine

Generating an index when querying the data splicing conditions, quickly found to be queried.

Real Time Streaming computing

Whether real-time recommendation system, or incremental logs are stored, it is a real-time stream computing.

By the incremental data is stored Hbase, and real-time query Hbase in streaming, the combination of history to get the final results.

More Hbase, Flink, Kafka and other real-time streaming computing-related blog, welcome attention to real-time streaming calculated as follows:

file

Guess you like

Origin www.cnblogs.com/tree1123/p/11576372.html