How hive works

Insert picture description here

Composition and function:

User interface: ClientCLI (hive shell), JDBC/ODBC (java access hive), WEBUI (browser access hive)

Metadata: Metastore

Metadata includes: table name, database to which the table belongs (default is default), table owner, column/partition field, table type (whether it is an external table), the directory where the table data is located, etc.;

Stored in the built-in derby database by default, MySQL is recommended to store Metastore

Hadoop
uses HDFS for storage and MapReduce for calculation

Driver: Driver
(1) Parser (SQL Parser): Convert SQL strings into abstract syntax tree AST, this step is generally completed with a third-party tool library, such as antlr; parse the AST, such as whether the table exists, the field Whether it exists and whether the SQL semantics is wrong.

(2) Compiler (Physical Plan): Compile AST to generate a logical execution plan.

(3) Optimizer (Query Optimizer): optimize the logical execution plan.

(4) Execution: Convert the logical execution plan into a physical plan that can be run. For Hive, it is MR/Spark.

working principle:

The user creates database and table information and stores it in the metadata database of hive;

Load data into the table, and the metadata records the mapping relationship between the path of the hdfs file and the table;

To execute a query statement, first go through a parser, a compiler, an optimizer, and an executor to translate the instructions into MapReduce, submit them to Yarn for execution, and finally output the results of the execution to the user interaction interface.

Guess you like

Origin blog.csdn.net/qq_42706464/article/details/108256373