FusionInsight large application development data development --HBase

HBase Application Development

HBase definition

HBase is a highly reliable, high performance, column-oriented, scalable, distributed storage system.

  • Suitable for storing large tables of data, you can achieve real-time level.
  • Use Hadoop HDFS as a file storage system that provides real-time database system to read and write.
  • ZooKeeper use as a collaborative service.

HBase architecture

HBase applicable scene

  • Massive Data
  • High throughput
  • Need for efficient random access data in the mass
  • We need a good performance scalability
  • Capable of simultaneously processing data structured and unstructured
  • You do not need to have full ACID properties of traditional relational database included

HBase application development process

  • Develop business objectives
  • Ready development environment
  • Download and import the sample project
  • HBase table design

Design Principles:

The only query data

Data evenly distributed

Query Performance Tuning

Other factors (region ahead division, use of hot and cold Family )

  • According to scenario development project
  • Compile and run
  • View Results and debugger

HBase table design - general principles

Design objectives: to improve the throughput
design principles: pre-separation region, a region uniformly distributed, increasing concurrency
implementation: RowKey range distribution is known and recommended that the pre-separation region

Design goals: to improve write performance
design principles: avoid excessive hot spot region
design methods: Depending on the application scenario, consider the time factor introduced Rowkey

Design goals: improve query performance
design principles: continuous data storage, data is stored in one place frequently accessed data is stored contiguously, dispersion, information redundancy.
Method: simultaneously read data stored in the same row, cell, use the secondary index

HBase table design - design elements

SUMMARY different design dimensions, can be divided into:
the Table Design (the design of the table size)

  • Methods to build the table
  • Pre-separation region
  • Family Properties
  • System concurrency, data cleaning ability

RowKey Design

  • Principles: the need to access data, RowKey continuous as possible
  • Access efficiency: disperse write, sequential read
  • Properties content: common inquiries scene properties
  • Property Value sequence: enumerate, access weight
  • Time properties: cycle Key + TTL, built periodic table
  • Secondary index
  • Eclectic Method
  • Redundancy Act

Family Design

It can be enumerated small number of extended attributes as weak Family

Qualifier Design

 Not enumerable, number and scalability attributes as a Qualifier

Principles: access stored data at the same time to the same Cell, column name as brief as possible

HBase Common Interface

create()

put()

get()

getScanner(Scan scan)

、、、

Create a Configuration instance and Kerberos security authentication

HBaseConfiguration way

Create a table

create Table method

data input

put method

Reads one line of data

get method

Read multiple rows of data

scan method

 

Guess you like

Origin www.cnblogs.com/cainiao-chuanqi/p/11010227.html