Hive Learning Series (1) What is Hive and Hive Architecture

       I have been busy with interviews recently. During the interview, several companies asked about Hive. However, due to the fact that I have been busy with overseas data warehouses for the past three years, I only know about big data and have no actual use. In order to A better interview, especially summed up the relevant knowledge of Hive

(1) What is Hive

        1.1 Hive is an important member of the Hadoop tool family, which can map structured data files (HDFS) into a database table.

        1.2 Hive defines a simple SQL-like query language, called HQL, to achieve convenient and efficient data query

        1.3 The essence of Hive is to convert HQL into MapReduce tasks, complete the ETL of the entire data, and reduce the complexity of writing MapReduce

(2) Architecture of Hive

        The Hive architecture includes the following components: CLI (command line interface), JDBC/ODBC, Thrift Server, Hive WEB Interface (HWI), metastore and Driver (Complier, Optimizer and Executor)

       Driver component : The core component, the core of the entire Hive, this component includes Compiler, Optimizer and Executor. Its function is to parse, compile and optimize the HQL statement we wrote, generate an execution plan, and then call the underlying MapReduce computing framework.

       Metastore component : metadata service component, this component stores the metadata of hive, and the metadata of hive is stored in the relational database. The relational databases supported by hive are derby and mysql.

       CLI : command line interface, command line interface.

  ThriftServers : Provides JDBC and ODBC access capabilities. It is used to develop scalable and cross-language services. Hive integrates this service, allowing different programming languages ​​to call the hive interface.

  Hive WEB Interface (HWI) : The hive client provides a way to access the services provided by hive through a web page. This interface corresponds to the hwi component of hive (hive web interface)

       

(3) Simple schematic diagram of the execution process of Hive

Hive will access related queries through CLI, JDBC/ODBC, or HWI access, compile, analyze and optimize through Driver (Complier, Optimizer, and Executor), and finally become an executable MapReduce. 

Are you familiar with these? Yes, this is very similar to the traditional database structure. I found the next Maysql structure here.

      The Hive function is somewhat similar to the traditional service protocol, the parser, the preprocessor, the optimizer, and the query execution plan are a summary of these functions.

      It's just that Hive converts HQL into MapReduce, while traditional data converts SQL into a language that the execution engine can recognize

 

(4) Schematic diagram of the execution flow of Hive (detailed)

This picture is a detailed process executed by Hive. I am still learning a lot of things in it, so I put it here first.

(5) Summary of this chapter

   This chapter mainly introduces what is Hive, the architecture of Hive, and there are a series of Hive articles later.

 

 

Hive Learning Series (2) Detailed Explanation of Hive Query Process 

 Hive Learning Series (1) What is Hive and Hive Architecture

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326915648&siteId=291194637