The difference and practical application of HBase and Hive

The difference and practical application of HBase and Hive

1. HBase understanding

Hbase is the abbreviation of Hadoop database, which is based on Hadoop database. It is a NoSQL database, mainly suitable for random real-time query transaction list, trajectory behavior, etc.

HBase tables are physical tables, suitable for storing unstructured data. Provide a super-large memory hash table, which is used by search engines to store indexes and facilitate query operations.

HBase is based on Hadoop's HDFS storage, managed by zookeeper, and the processing data is based on column rather than row-based mode, which is suitable for random access of massive data.

HBase is a near-real-time system that supports real-time query, addition, deletion, modification and query .

2. Hive understands

Hive is a data . Strictly speaking, it is not a database . It mainly enables developers to calculate and process structured data on HDFS through SQL, and is suitable for offline batch data calculation. A structured data file can be mapped to a database table, and a simple sql query function is provided.

Table pure logic in Hive.

Hive itself does not store and calculate data, it completely depends on HDFS and MapReduce, Hive needs to use HDFS to store files, needs to use MapReduce computing framework, and MapReduce processes data in a row-based mode .

Hive uses Hadoop to analyze and process data, and Hadoop system is a batch processing system, so it cannot guarantee low processing latency, and only supports import and query .

3. Practical use

Hbase and Hive are in different positions in the big data architecture. Hbase mainly solves the problem of real-time data query, and Hive mainly solves the problem of data processing and calculation . The two are collaborative and generally used together.

insert image description here

  • Extract data sources to HDFS storage through ETL tools;
  • Clean, process and calculate raw data through Hive;
  • The results after Hive cleaning and processing can be stored in HBase if it is a random query scenario for massive data
  • The data application queries data from HBase;

Guess you like

Origin blog.csdn.net/a6661314/article/details/127129565