Hive (java implementation of jdbc/odbc) data warehouse management tool based on hadoop

Apache Hive is a data warehouse built on top of Hadoop architecture. It can provide data refinement, query and analysis. Structured data files can be mapped to a database table, and simple SQL statements can be converted into MapReduce tasks for execution. The learning cost is low, and simple MapReduce statistics can be quickly realized, which is very suitable for statistical analysis of data warehouses.

architecture

basic component:

  • User interface: including CLI, JDBC/ODBC, WebGUI. Among them, CLI (command line interface) is the shell command line; JDBC/ODBC is the Java implementation of Hive, which is similar to the traditional database JDBC; WebGUI is to access Hive through a browser.
  • Metadata storage: usually stored in a relational database such as mysql/derby. Hive stores metadata in the database. The metadata in Hive includes the name of the table, the columns and partitions of the table and their attributes, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is stored, and so on.
  • Interpreter, compiler, optimizer, and executor: Complete the HQL query statement from lexical analysis, syntax analysis, compilation, optimization, and query plan generation. The generated query plan is stored in HDFS and subsequently executed by MapReduce calls.
  • The first four file formats supported by Hive are Plain Text, Sequence File, Optimized Row Columnar (ORC) format, and RCFile

Guess you like

Origin blog.csdn.net/weixin_29403917/article/details/128113453