Data warehouse components: Hive environment construction and basic usage

Source code of this article: GitHub || GitEE

1. Introduction to Hive Basics

1. Basic description

Hive is a data warehouse tool based on Hadoop, used for data extraction, transformation, and loading. It is a component that can query, analyze and store large-scale data stored in Hadoop. Hive data warehouse tools can convert structured data. The file is mapped to a database table and SQL query function is provided. SQL statements can be converted into MapReduce tasks for execution. The use cost is low. Fast MapReduce statistics can be realized through similar SQL statements, making MapReduce easier without the need for special development. MapReduce application. Hive is very suitable for statistical analysis of data warehouse.

2. Composition and structure

User interface : ClientCLI, JDBC to access Hive, WEBUI browser to access Hive.

Metadata : Hive stores metadata in databases, such as mysql and derby. The metadata in Hive includes the name of the table, the columns and partitions and attributes of the table, the attributes of the table (whether it is an external table, etc.), and the directory where the data of the table is located.

Driver : Based on the interpreter, editor, and optimizer, the HQL query statement is generated from lexical analysis, syntax analysis, compilation, optimization and query plan generation.

Execution Engine : ExecutionEngine converts the logical execution plan into a physical plan that can be run.

Hadoop bottom layer : storage based on HDFS, calculation using MapReduce, scheduling mechanism based on Yarn.

Hive receives the interactive request sent to the client, receives the operation instruction (SQL), translates the instruction into MapReduce, submits it to Hadoop for execution, and finally outputs the execution result to the client.

2. Hive environment installation

1. Prepare the installation package

hive-1.2, depends on the Hadoop cluster environment, and is located on the hop01 service.

2. Unzip and rename

tar -zxvf apache-hive-1.2.1-bin.tar.gz
mv apache-hive-1.2.1-bin/ hive1.2

3. Modify the configuration file

Create configuration file

[root@hop01 conf]# pwd
/opt/hive1.2/conf
[root@hop01 conf]# mv hive-env.sh.template hive-env.sh

Add content

[root@hop01 conf]# vim hive-env.sh
export HADOOP_HOME=/opt/hadoop2.7
export HIVE_CONF_DIR=/opt/hive1.2/conf

The configuration content is the Hadoop path and the hive configuration file path.

4. Hadoop configuration

First start hdfs and yarn; then create two directories /tmp and /user/hive/warehouse on HDFS and modify permissions.

bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir -p /user/hive/warehouse
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse

5. Start Hive

[root@hop01 hive1.2]# bin/hive

6. Basic operations

View database

hive> show databases ;

Select database

hive> use default;

View data sheet

hive> show tables;

Create a database to use

hive> create database mytestdb;
hive> show databases ;
default
mytestdb
hive> use mytestdb;

Create table

create table hv_user (id int, name string, age int);

View table structure

hive> desc hv_user;
id                  	int                 	                    
name                	string              	                    
age                 	int 

Add table data

insert into hv_user values (1, "test-user", 23);

Query table data

hive> select * from hv_user ;

Note: Through the observation of the query log, the process executed by Hive can be clearly seen.

Delete table

hive> drop table hv_user ;

Exit Hive

hive> quit;

View the Hadoop directory

# hadoop fs -ls /user/hive/warehouse       
/user/hive/warehouse/mytestdb.db

The database and data created by Hive are stored on HDFS.

Three, integrate MySQL5.7 environment

Here the MySQL5.7 version is installed by default, and the relevant login account is configured, and the Host of the root user is configured in% mode.

1. Upload the MySQL driver package

Upload the MySQL driver dependency package to the lib directory of the hive installation directory.

[root@hop01 lib]# pwd
/opt/hive1.2/lib
[root@hop01 lib]# ll
mysql-connector-java-5.1.27-bin.jar

2. Create hive-site configuration

[root@hop01 conf]# pwd
/opt/hive1.2/conf
[root@hop01 conf]# touch hive-site.xml
[root@hop01 conf]# vim hive-site.xml

3. Configure MySQL storage

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://hop01:3306/metastore?createDatabaseIfNotExist=true</value>
          <description>JDBC connect string for a JDBC metastore</description>
        </property>

        <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
          <value>com.mysql.jdbc.Driver</value>
          <description>Driver class name for a JDBC metastore</description>
        </property>

        <property>
          <name>javax.jdo.option.ConnectionUserName</name>
          <value>root</value>
          <description>username to use against metastore database</description>
        </property>

        <property>
          <name>javax.jdo.option.ConnectionPassword</name>
          <value>123456</value>
          <description>password to use against metastore database</description>
        </property>
</configuration>

After the configuration is complete, restart the MySQL, hadoop, and hive environments in turn to view the MySQL database information. There are more metastore databases and related tables.

4. Start hiveserver2 in the background

[root@hop01 hive1.2]# bin/hiveserver2 &

5. Jdbc connection test

[root@hop01 hive1.2]# bin/beeline
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://hop01:10000
Connecting to jdbc:hive2://hop01:10000
Enter username for jdbc:hive2://hop01:10000: hiveroot (账户回车)
Enter password for jdbc:hive2://hop01:10000: ******   (密码123456回车)
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
0: jdbc:hive2://hop01:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+

Four, advanced query syntax

1. Basic functions

select count(*) count_user from hv_user;
select sum(age) sum_age from hv_user;
select min(age) min_age,max(age) max_age from hv_user;
+----------+----------+--+
| min_age  | max_age  |
+----------+----------+--+
| 23       | 25       |
+----------+----------+--+

2. Conditional query statement

select * from hv_user where name='test-user' limit 1;
+-------------+---------------+--------------+--+
| hv_user.id  | hv_user.name  | hv_user.age  |
+-------------+---------------+--------------+--+
| 1           | test-user     | 23           |
+-------------+---------------+--------------+--+

select * from hv_user where id>1 AND name like 'dev%';
+-------------+---------------+--------------+--+
| hv_user.id  | hv_user.name  | hv_user.age  |
+-------------+---------------+--------------+--+
| 2           | dev-user      | 25           |
+-------------+---------------+--------------+--+

select count(*) count_name,name from hv_user group by name;
+-------------+------------+--+
| count_name  |    name    |
+-------------+------------+--+
| 1           | dev-user   |
| 1           | test-user  |
+-------------+------------+--+

3. Connect query

select t1.*,t2.* from hv_user t1 join hv_dept t2 on t1.id=t2.dp_id;
+--------+------------+---------+-----------+-------------+--+
| t1.id  |  t1.name   | t1.age  | t2.dp_id  | t2.dp_name  |
+--------+------------+---------+-----------+-------------+--+
| 1      | test-user  | 23      | 1         | 技术部      |
+--------+------------+---------+-----------+-------------+--+

Five, source code address

GitHub·地址
https://github.com/cicadasmile/big-data-parent
GitEE·地址
https://gitee.com/cicadasmile/big-data-parent

Recommended reading: finishing programming system

Serial number project name GitHub address GitEE address Recommended
01 Java describes design patterns, algorithms, and data structures GitHub·click here GitEE·Click here ☆☆☆☆☆
02 Java foundation, concurrency, object-oriented, web development GitHub·click here GitEE·Click here ☆☆☆☆
03 Detailed explanation of SpringCloud microservice basic component case GitHub·click here GitEE·Click here ☆☆☆
04 Comprehensive case of SpringCloud microservice architecture actual combat GitHub·click here GitEE·Click here ☆☆☆☆☆
05 Getting started with SpringBoot framework basic application to advanced GitHub·click here GitEE·Click here ☆☆☆☆
06 SpringBoot framework integrates and develops common middleware GitHub·click here GitEE·Click here ☆☆☆☆☆
07 Basic case of data management, distribution, architecture design GitHub·click here GitEE·Click here ☆☆☆☆☆
08 Big data series, storage, components, computing and other frameworks GitHub·click here GitEE·Click here ☆☆☆☆☆

Guess you like

Origin blog.csdn.net/cicada_smile/article/details/112162362