Hive basics and detailed explanation

1. Start hive

1. Preconditions for hive startup

1. Make sure that hdfs and yarn are started
2. Make sure that hive's metabase mysql is started

insert image description here
insert image description here
insert image description here

2. Start method 1: hive command

--切换到hive目录下的bin目录
 cd /opt/softs/hive3.1.2/bin/

--执行hive命令
hive

3. Method 2: Use jdbc to connect to hive

(1) Configure the file hive-site.xml in the hive directory

cd /opt/softs/hive3.1.2/conf/
ll
vim hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
  <!-- jdbc 连接的 URL -->
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://bigdata03:3306/hive?createDatabaseIfNotExist=true</value>
  </property>
  <!-- jdbc 连接的 Driver-->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <!-- jdbc 连接的 username-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <!-- jdbc 连接的 password-->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>111111</value>
  </property>
   <!-- Hive 默认在 HDFS 的工作目录 -->
  <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
   </property>
   <!-- 指定 hiveserver2 连接的 host -->
  <property>
        <name>hive.server2.thrift.bind.host</name>
    <value>bigdata03</value>
  </property>
        <!-- 指定 hiveserver2 连接的端口号 -->
  <property>
        <name>hive.server2.thrift.port</name>
    <value>10000</value>
  </property>

  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>

</configuration>

insert image description here

(2) Start hiveserver2

-- 启动hiveserver2命令
hive --service hiveserver2

-- 注意:启动hiveserver2服务需要些时间才能启动完成,
且没有返回可输入命令行的界面,需要另开一个tab页面输入执行相关的命令指令

进阶的启动方式
(1)/opt/softs/hive3.1.2目录下创建logs目录
cd /opt/softs/hive3.1.2
mkdir logs

(2)执行如下命令
cd /opt/softs/hive3.1.2/bin/
nohup hive --service hiveserver2 1>/opt/softs/hive3.1.2/logs/hive.log 2>/opt/softs/hive3.1.2/logs/hive_err.log &
-- nohup:放在命令的开头,表示的意思为不挂起即关闭终端进程也保持允许状态
--1:代表标准日志输出
--2:表示错误日志输出
-- &:代表在后台运行
所以整个命令可以理解为:将hiveserver2服务后台运行在标准日志输出到hive.1og,错误日志输出到hive_err.log,唧使关闭终端(窗口),也会保持运行状态

(3) Execute the beeline command

beeline -u jdbc:hive2://bigdata03:10000 -root

insert image description here

Advanced:

insert image description here

Note: An error may be reported when executing the beeline command, because it takes two to three minutes to start hiveserver2. Only after the hiveserver2 startup is completed, the jdbc connection may be completed with the beeline command.

Two, Hive commonly used interactive commands

1. hive -help command

 hive -help

insert image description here

2. hive -e command

hive -e "show databases"

Execute SQL statements without entering the interactive window of hive
insert image description here

3. hive -f command

Execute the sql statement in the file

-- 创建文件
 cd /opt/file/
 touch hive_sql.txt
 vim hive_sql.txt

-- 添加sql语句命令“show databases”
show databases
cat hive_sql.txt

-- 将执行结果写入到新文件中
hive -f /opt/file/hive_sql.txt >/opt/file/query_result.txt
-- 查看执行结果
cat query_result.txt

insert image description here

4. Exit the hive window

(1) exit;
(2) quit;

5. Execute dfs -ls / in the hive window;

dfs -ls /; Execute to view the hdfs file system in the hive window

insert image description here

3. Hive syntax

1. DDL statement

insert image description here
insert image description here

1.1 Create a database

create database if not exists bigdata;

insert image description here

1.2 Two ways to query the database

insert image description here

show databases;
show databases like "big*";

insert image description here

1.3 Display database information

insert image description here

desc database bigdata;

desc database extended bigdata;

insert image description here

1.4 Switch database

insert image description here

1.5 Modify database configuration information

insert image description here

alter database bigdata set dbproperties('createtime'='20230423');

desc database extended bigdata;


insert image description here

At the same time, you can also see the storage path (Location) of the data on hdfs:

hdfs://bigdata03:8028/user/hive/warehouse/bigdata.db

insert image description here

1.6 Delete database

Note: One of the two grammars corresponds to 空数据库and one is非空数据库

insert image description here

1.7 Create hive table (emphasis)

insert image description here

1.7.1 hive detailed table creation statement
CREATE [EXTERNAL] TABLE [IF NOT EXIST] table_name
[(col_name data_type [COMMENT COL_COMMENT],.....)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment],....)]
[CLUSTERED BY (col_name,col_name,....)]
[SORTED BY (col_name [ASC|DESC],...)] INFO num_buckets BUCKETS]
[ROW FORMAT DELIMITED FIELDS TERMINATED BY ',']
[STORED AS file_format]
[LOCATION hdfs_path]

字段解释

1 CREATE TABLE创建一个指定名字的表,如果名字相同抛出异常,用户可以使用IF NOT EXIST来忽略异常

2 EXTERNAL关键字可以创建一个外部表,在建表的同时指定一个实际数据的路径(LOCATION)
,hive在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据

3 COMMENT是为表和列添加注释

4 PARTITIONED BY是分区表

5 CLUSTERED BY 是建分桶(不常用)

6 SORTED BY 是指定字段进行排序(不常用)

7 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 是指每行数据中列分隔符为"," 
默认分隔符为" \001"

8 STORED AS 指定存储文件类型,数据文件是纯文本,可以使用STORED AS TEXTFILE

9 LOCATION 指定表在HDFS上的存储位置,内部表不要指定,
但是如果定义的是外部表,则需要直接指定一个路径。

For the data in sale_detail:

1, Xiao Ming, male, iphone14, 5999,
1 2, Xiao Hua, male, Feitian Moutai, 2338,
2 3, Xiao Hong, female, Lancome Black Bottle Essence, 1080, 1
4, Xiao Wei, unknown, Mijia Walking Machine, 1499, 1 5, Xiao Hua
, male, Great Wall Red Wine, 158, 10
6, Xiao Hong, female, Proya Mask, 79, 2 7, Xiao Hua, Male, Pearl
River Beer, 11,3
8, Xiao Ming, Male, Apple Watch 8, 2999,1

1.7.2 Create hive internal table:
CREATE TABLE IF NOT EXISTS bigdata.ods_sale_detail
( 
  sale_id    INT      COMMENT  "销售id"
 ,user_name  STRING   COMMENT  "用户姓名"
 ,user_sex   STRING   COMMENT  "用户性别"
 ,goods_name STRING   COMMENT  "商品名称"
 ,prcie      INT      COMMENT  "单价"
 ,sale_count INT      COMMENT  "销售数量"
)
COMMENT "销售内部表"
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE;

insert image description here
View the table creation results

insert image description here

1.7.3 Create hive external table:
CREATE EXTERNAL TABLE IF NOT EXISTS bigdata.ods_sale_detail_external
( 
  sale_id    INT      COMMENT  "销售id"
 ,user_name  STRING   COMMENT  "用户姓名"
 ,user_sex   STRING   COMMENT  "用户性别"
 ,goods_name STRING   COMMENT  "商品名称"
 ,price      INT      COMMENT  "单价"
 ,sale_count INT      COMMENT  "销售数量"
)
COMMENT "销售外部表"
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/bigdata/hive/external_table/ods/ods_sale_detail_external";

insert image description here

insert image description here

2. DML statement

2.1 Load data into the table (Load)

insert image description here
Create the sale_detail.txt file in the /opt/file directory of the virtual machine bigdata03, and add:

1, Xiao Ming, male, iphone14, 5999,
1 2, Xiao Hua, male, Feitian Moutai, 2338,
2 3, Xiao Hong, female, Lancome Black Bottle Essence, 1080, 1
4, Xiao Wei, unknown, Mijia Walking Machine, 1499, 1 5, Xiao Hua
, male, Great Wall Red Wine, 158, 10
6, Xiao Hong, female, Proya Mask, 79, 2 7, Xiao Hua, Male, Pearl
River Beer, 11,3
8, Xiao Ming, Male, Apple Watch 8, 2999,1

-- load data [local] inpath '数据的 path' [overwrite] into table dbname.tablename [partition (partcol1=val1,…)];

-- 不含overwrite,多次执行装载Load会不去重
load data local inpath '/opt/file/sale_detail.txt' into table bigdata.ods_sale_detail;


The Load command does not contain overwrite, and the load will not be deduplicated if it is executed multiple times.

insert image description here

Execute the Load command again:

insert image description here
Check the data table again, you will find that the data will be appended again, and will not be deduplicated

insert image description here

2.2 Load command to add overwrite, data deduplication

Load command is added overwrite, multiple executions of Load will remove duplicates

load data local inpath '/opt/file/sale_detail.txt' overwrite into table bigdata.ods_sale_detail;

insert image description here

Use the txt file containing the data to add data to the external table

hadoop fs -put /opt/file/sale_detail.txt /bigdata/hive/external_table/ods/ods_sale_detail_external

insert image description here

Command line interface to query external table data:

insert image description here

View table creation information

insert image description here

You can find information such as the storage path of the table bigdata.ods_sale_detail

LOCATION
| ‘hdfs://bigdata03:8020/user/hive/warehouse/bigdata.db/ods_sale_detail’ |

2.3 Delete the internal table

Drop internal table ods_sale_detail

insert image description here

drop table bigdata.ods_sale_detail;

After deletion, check whether the internal table still exists:

hadoop fs -ls /user/hive/warehouse/bigdata.db/ods_sale_detail

The deletion is successful (the table and data are deleted together):

insert image description here

2.4 Delete external table

drop external table ods_sale_detail_external

drop table bigdata.ods_sale_detail_external;

insert image description here

After deleting the external table, only the structure of the table is deleted, and cannot be queried in the command line interface,
insert image description here
but the external file is still there, and the data is still retained
insert image description here

When creating the hive external table again:

CREATE EXTERNAL TABLE IF NOT EXISTS bigdata.ods_sale_detail_external
( 
  sale_id    INT      COMMENT  "销售id"
 ,user_name  STRING   COMMENT  "用户姓名"
 ,user_sex   STRING   COMMENT  "用户性别"
 ,goods_name STRING   COMMENT  "商品名称"
 ,price      INT      COMMENT  "单价"
 ,sale_count INT      COMMENT  "销售数量"
)
COMMENT "销售外部表"
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/bigdata/hive/external_table/ods/ods_sale_detail_external";

insert image description here

Data can still be queried on the command line interface:

select * from bigdata.ods_sale_detail_external;

insert image description here

4. Other reference materials of Hive

insert image description here

insert image description here

insert image description here

insert image description here

insert image description here

insert image description here

insert image description here

insert image description here

Guess you like

Origin blog.csdn.net/m0_48170265/article/details/130332080