HBase Application Development
HBase definition
HBase is a highly reliable, high performance, column-oriented, scalable, distributed storage system.
- Suitable for storing large tables of data, you can achieve real-time level.
- Use Hadoop HDFS as a file storage system that provides real-time database system to read and write.
- ZooKeeper use as a collaborative service.
HBase architecture
HBase applicable scene
- Massive Data
- High throughput
- Need for efficient random access data in the mass
- We need a good performance scalability
- Capable of simultaneously processing data structured and unstructured
- You do not need to have full ACID properties of traditional relational database included
HBase application development process
- Develop business objectives
- Ready development environment
- Download and import the sample project
-
HBase table design
Design Principles:
The only query data
Data evenly distributed
Query Performance Tuning
Other factors (region ahead division, use of hot and cold Family )
- According to scenario development project
- Compile and run
- View Results and debugger
HBase table design - general principles
Design objectives: to improve the throughput
design principles: pre-separation region, a region uniformly distributed, increasing concurrency
implementation: RowKey range distribution is known and recommended that the pre-separation region
Design goals: to improve write performance
design principles: avoid excessive hot spot region
design methods: Depending on the application scenario, consider the time factor introduced Rowkey
Design goals: improve query performance
design principles: continuous data storage, data is stored in one place frequently accessed data is stored contiguously, dispersion, information redundancy.
Method: simultaneously read data stored in the same row, cell, use the secondary index
HBase table design - design elements
SUMMARY different design dimensions, can be divided into:
the Table Design (the design of the table size)
- Methods to build the table
- Pre-separation region
- Family Properties
- System concurrency, data cleaning ability
RowKey Design
- Principles: the need to access data, RowKey continuous as possible
- Access efficiency: disperse write, sequential read
- Properties content: common inquiries scene properties
- Property Value sequence: enumerate, access weight
- Time properties: cycle Key + TTL, built periodic table
- Secondary index
- Eclectic Method
- Redundancy Act
Family Design
It can be enumerated small number of extended attributes as weak Family
Qualifier Design
Not enumerable, number and scalability attributes as a Qualifier
Principles: access stored data at the same time to the same Cell, column name as brief as possible
HBase Common Interface
create()
put()
get()
getScanner(Scan scan)
、、、
Create a Configuration instance and Kerberos security authentication
HBaseConfiguration way
Create a table
create Table method
data input
put method
Reads one line of data
get method
Read multiple rows of data
scan method