hive and hbase

Author: Duowen
Link : https://www.zhihu.com/question/21677041/answer/185664626
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Conclusions: Hbase and Hive are in different positions in the big data architecture. Hbase mainly solves the problem of real-time data query, and Hive mainly solves the problem of data processing and calculation, and is generally used together.

First, the difference:

  1. Hbase: The abbreviation of Hadoop database, which is based on Hadoop database.
  2. Hive: Hive is a Hadoop data warehouse. Strictly speaking, it is not a database. It mainly enables developers to calculate and process structured data on HDFS through SQL, which is suitable for offline batch data calculation.
  • Metadata is used to describe the structured text data on Hdfs. Generally speaking, it is to define a table to describe the structured text on HDFS, including the data name of each column, what the data type is, etc., which is convenient for us to process data. Currently Many SQL ON Hadoop computing engines use hive metadata, such as Spark SQL, Impala, etc.;
  • Based on the first point, to process and calculate HDFS data through SQL, Hive will translate SQL into Mapreduce to process data;

2. Relationship

In the big data architecture, Hive and HBase are cooperative relations, and the data flow is generally as follows:

  1. Extract data sources to HDFS storage through ETL tools;
  2. Clean, process and compute raw data through Hive;
  3. The result after HIve cleaning can be stored in Hbase if it is a random query scenario for massive data.
  4. Data applications query data from HBase;

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325066793&siteId=291194637