Hbase and Hive features and application scenarios

Turn: http://www.imooc.com/article/271342

What Hbase that?

Hbase is a columnar storage architecture on Hdfs file system is open source, distributed, column-oriented database. Suitable for unstructured data stored in the database.

Hbase is a highly reliable, high-performance, column-oriented, scalable distributed storage system, can build large-scale structure of the storage cluster in a cheap PC Server.

1, HBase structured storage layer located Hadoop ecosystem.

webp

image

2, HDFS as the underlying file storage

3, MapReduct to provide a high-performance computing Hbase

4, Zookeeper for the HBase provides the ability to stabilize the service and failover

webp

image

What Hive that?

hive is a Hadoop-based data warehousing tools. Structured data can be mapped to a database table, and provides the ability to query the sql, sql can be a change for the MapReduce tasks.

Here we look at the Hive architecture diagram:

1, the user interface, hive there are three main interfaces, CLI (CLI will be activated when the pump while a copy of the Hive), Client (hive client link hive server), web UI (accessed through a browser)

2, the metadata store, Hive metadata stored in the database, such as: mysql.

3, Driver (interpreter, compiler, optimizer, executor): Complete lexical analysis, syntax analysis, optimization, compilation, optimization and query plan generation, followed by the use of MapReduce.

4, Hadoop, hive data is stored in the Hdfs. Most of the inquiries completed by the MapReduce.

webp

image

Hbase

We were looked Hbase above features, hive features, then the difference Hbase and Hive is and what the scene is their use?

Both are based on Hdfs Hive Hbase and file storage.

Hbase support column extension, can be modified cells. KV adopt the design, so the query efficiency is relatively high, generally used for low latency patience scene; there is often need extended attributes, modify the properties of the scene.

Hbase queries are generally performed by a command window, the statement more responsible, but the hive of a standard sql grammar, low threshold, easy to get started. Of course, there are Hbase Phoenix can go to support the operation of such syntax sql.

Let's look at hbase specific application scenarios:

Ten million concurrent, PB storage, KV underlying storage, dynamic column, strong synchronization, sparse tables, secondary indexes, SQL

webp

image

Object storage: Class headlines, news of the news, web pages, images stored in HBase, some of the virus's virus database is stored in HBase.

Time series data: HBase over there OpenTSDB modules, to meet the timing requirements of the class scene.

Recommended portrait: the user's portrait, it is a relatively large sparse matrices. Ants risk control is built on top of HBase.

Spatio-temporal data: main track, the weather grid and the like, drops a taxi trajectory data mainly in HBase, another large amount of data in all the little car networking technology companies, the data are in the presence of HBase

CubeDB OLAP: Kylin a cube analysis tools, the underlying data is stored in HBase, many based on the customer's own built offline cube stored in the hbase, to meet the needs of the online report query calculations.

Message / order: in the field of telecommunications, the banking sector, many of the underlying storage order inquiry, in addition to a lot of communication, messaging applications built on top of HBase synchronization.

Message / order: in the field of telecommunications, the banking sector, many of the underlying storage order inquiry, in addition to a lot of communication, messaging applications built on top of HBase synchronization.

Hive can not support column extension, additional support, support seems to be modified in the new version, but the efficiency is relatively low. Hive high-throughput data processing, the larger the file, hive obvious advantage is about. Half delay for high endurance scene.

The following look at Hive specific usage scenarios:

1, analysis of network logs.

2, ETL data cleaning.

3, building a data warehouse.

4, data mining

Finally, in summary: Hbase and Hive itself can not store data. Both are the files on Hdfs doing a tissue. In order to adapt to different scenarios. Hbase in the query, dynamic columns scene an advantage, but not for data analysis and mining. Two Hive itself can not be used in low latency scenarios. Hive ETL can handle large amounts of data cleaning. Build a unified standard data warehouse to provide basic data, a total of upper layer data analysis. So hive is more biased in favor of the data analysis.


Author: Mu sister 8265434
link: http: //www.imooc.com/article/271342
Source: Mu class network

Guess you like

Origin www.cnblogs.com/ceshi2016/p/12123629.html