Scenarios and features of HBase

First, what Hbase do?
1. The mass data storage:
billions of rows x on one million
limit and no columns
when the table is very large in order to play this role, up to millions of rows, it is not necessary to put in hbase
2. quasi-real-time query:
one hundred x one million million lines, within hundreds of milliseconds
Second, the application Hbase in actual scene:
1. transport:
ship GPS information, GPS information ship the whole Yangtze River, the data stored about 10 million a day.
2. Financial aspects:
consumer information, credit information, credit card and other payment information
3. electricity suppliers:
transaction information such as Taobao logistics information, browse information
4. Move:
Calls, are based HBase storage.
Hbase characteristics:
1. capacity:
a traditional relational database, a single table is not more than five million, more than to do sub-table and warehouses, not more than 30
Hbase single table rows can have ten billion, one million, data horizontal and vertical dimensions of the matrix supported by the order of the data is very elastic
2. oriented columns:
a column-oriented access control and storage, and retrieval to support independent, dynamically adding columns, i.e., columns may be individually various aspects operation
column storage, the data is stored in a column in accordance with time, so that the query requires only a few fields, can greatly reduce the number of reads in table
3. multi version:
Hbase each column of data is stored more Version, such as address column, there may be more change, so this column can have multiple Version
4. sparsity:
Empty columns do not take up storage space, you can watch design is very sparse.
Do not like the relational database as the need to know in advance all the column names and then to null padding
5. expansion: the
underlying rely HDFS, when insufficient disk space, only need to dynamically increase datanode service node (machine) on it
6. High reliability sex:
WAL mechanism to ensure that when data is written not as a result of abnormal clusters writing data loss
Replication mechanism to ensure the emergence of serious problems when the cluster, no data loss or corruption occurs
Hbase underlying the use of HDFS, itself also has a backup .
7. High performance:
the LSM underlying data structure and other unique and orderly arrangement RowKey design architecture, such Hbase write performance is very high.
Region segmentation, the primary key index, so that the caching mechanism Hbase have a certain mass at random read performance data, the performance can be reached for Rowkey query milliseconds
LSM tree, tree, child nodes are endmost way memory storing, in memory trees will be flush to the disk (when the child node reaches a certain threshold value, the disk will be put, and stored in the real-time process may merge into a master node, then the disk tree regularly do merge operation, combined into a tree, to optimize read performance).
introduction LSM tree: https: //www.cnblogs.com/yanghuahui/p/3483754.html

Summary:
column-oriented, large capacity, fast write than mysql but not read, more than five million read and write data, then proposed to use Hbase

 

Turn: https: //www.jianshu.com/p/fe63e9786146

Guess you like

Origin www.cnblogs.com/linyouyi/p/11462465.html