Composition and function:
User interface: ClientCLI (hive shell), JDBC/ODBC (java access hive), WEBUI (browser access hive)
Metadata: Metastore
Metadata includes: table name, database to which the table belongs (default is default), table owner, column/partition field, table type (whether it is an external table), the directory where the table data is located, etc.;
Stored in the built-in derby database by default, MySQL is recommended to store Metastore
Hadoop
uses HDFS for storage and MapReduce for calculation
Driver: Driver
(1) Parser (SQL Parser): Convert SQL strings into abstract syntax tree AST, this step is generally completed with a third-party tool library, such as antlr; parse the AST, such as whether the table exists, the field Whether it exists and whether the SQL semantics is wrong.
(2) Compiler (Physical Plan): Compile AST to generate a logical execution plan.
(3) Optimizer (Query Optimizer): optimize the logical execution plan.
(4) Execution: Convert the logical execution plan into a physical plan that can be run. For Hive, it is MR/Spark.
working principle:
The user creates database and table information and stores it in the metadata database of hive;
Load data into the table, and the metadata records the mapping relationship between the path of the hdfs file and the table;
To execute a query statement, first go through a parser, a compiler, an optimizer, and an executor to translate the instructions into MapReduce, submit them to Yarn for execution, and finally output the results of the execution to the user interaction interface.