Hive 常用的DDL语句大全 建库建表 详细

参考书籍:《Hive编程指南》

1.创建库

(1)语法结构

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]       //关于数据块的描述
  [LOCATION hdfs_path]          //指定数据库在HDFS上的存储位置
  [WITH DBPROPERTIES (property_name=property_value, ...)];    //指定数据块属性

默认地址:/user/hive/warehouse/db_name.db/table_name/partition_name/…

(2)创建库的方式

# 创建普通的数据库

create database t1;

# 创建库的时候检查存与否

create database if not exists t1;

# 创建库的时候带注释

create database if not exists t2 comment 'learning hive';

2.查看库

(1)查看库的方式

# 查看有哪些数据库

show databases;

+-----------------------+
| database_name         |
+-----------------------+
| default               |
| information_schema    |
| sys                   |
+-----------------------+

--使用正则表达式匹配来筛选需要的数据库名
show databases like 't*';
+-----------------------+
| database_name         |
+-----------------------+
| t1                    |
| t2                    |
+-----------------------+

Hive会为每个数据库创建一个目录,数据库中的表将会以这个数据库目录的子目录形式存储。
default数据库中的表是个例外,因为这个数据库本身没有自己的目录。
数据库所在的目录位于参数hive.metastore.warehouse.dir所指定的顶层目录之后。

# 显示数据库的详细属性信息

desc database extended t2;

+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
| t2 | learning hive | hdfs://server1:8020/warehouse/tablespace/managed/hive/t2.db | hdfs | USER | |
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+


# 查看正在使用哪个库

select current_database();

+----------+
| _c0      |
+----------+
| default  |
+----------+

# 查看创建库的详细语句

show create database t2;

+----------------------------------------------------+
| createdb_stmt                                      |
+----------------------------------------------------+
| CREATE DATABASE `t2`                               |
| COMMENT                                            |
| 'learning hive'                                    |
| LOCATION                                           |
| 'hdfs://server1:8020/warehouse/tablespace/managed/hive/t2.db' |
+----------------------------------------------------+

3.删除库

(1)删除语法

drop database t1;
drop database if exists t1;

# 默认情况下hive不允许删除包含表的数据库。用户要么先删除数据库中的表,然后再删除数据库,要么在删除命令的最后面加上关键字cascade,这样就能直接删除库。

drop database if exists t1 cascade;
drop database  t1 cascade;

4.切换库

use database_name;

5.创建表

(1)语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...)
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [ROW FORMAT row_format]
  [STORED AS file_format]
  [LOCATION hdfs_path]

(2)关键字说明

•EXTERNAL 创建一个外部表,在建表的同时指定指向实际数据的路径(LOCATION)

•LIKE 创建表时copy已有的表结构,不复制数据

•COMMENT 注释

•PARTITIONED BY 指定分区字段

•ROW FORMAT
  DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
    MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
    | SERDE serde_name [WITH SERDEPROPERTIES
    (property_name=property_value, property_name=property_value, ...)]
行设置

•STORED AS       //数据存储格式设置
  SEQUENCEFILE //序列化文件
  | TEXTFILE   //普通的文本文件格式
  | RCFILE   //行列存储相结合的文件
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定义文件格式
如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE 。

•LOCATION 指定表在HDFS的存储路径

建议:
如果一份数据已经存储在HDFS上,并且要被多个用户或者客户端使用,最好创建外部表。

 

(3)创建表以及操作

# 创建内部表

create table student_test
(
id int, 
name string, 
sex string, 
age int) 
row format delimited fields terminated by ",";

默认创建的表都是内部表(也叫管理表),Hive会控制着数据的生命周期,删除管理表时,也会删除表的数据。
管理表不方便和其他工作共享数据。

# 创建外部表(external)

create external table student_ext
(
id int, 
name string, 
sex string, 
age int) 
row format delimited fields terminated by "," 
location "/hive/student_ext";

外部表在删除该表时并不会删除掉这份数据,不过描述表的元数据信息会被删除掉

# 创建分区表

create external table student_part
(
id int, 
name string, 
sex string, 
age int)
partitioned by (city string)
row format delimited fields terminated by ","
location "/hive/student_part";

--添加新的分区
alter table student_part add partition (city="beijing");

# 分桶表

create external table student_bck
(
id int, 
name string, 
sex string, 
age int,
department string)
clustered by (id) 
sorted by (id asc, name desc) into 4 buckets
row format delimited fields terminated by ","
location "/hive/student_bck";

# like 拷贝一张已经存在的表的表模式

create table student_like like student;

--外部表
create external table student_like_ext like student
location '//hive/student_like_ext'

注意:
如果语句中省略掉external关键字而且原始表是外部表的话,那么生成的新表也将是外部表。
如果语句中省略掉external关键字而且原始表是管理表的话,那么生成的新表也将是管理表。
如果语句中有external关键字而且原始表是管理表的话,那么生成的新表将是外部表。

6.查看表

(1)查看表列表

# 查看当前使用的数据库中有哪些表

show tables;

+-------------------------+
| tab_name                |
+-------------------------+
| call_center             |
+-------------------------+

# 查看非当前使用的数据库中有哪些表

show tables in t2;

+-------------------------+
| tab_name                |
+-------------------------+
| catalog_page            |
+-------------------------+

# 查看数据库中以xxx开头的表

show tables like 'call_*';

+-------------------------+
| tab_name                |
+-------------------------+
| call_center             |
+-------------------------+

 

(2)查看表信息

desc call_center;

+--------------------+------------+----------+
|      col_name      | data_type  | comment  |
+--------------------+------------+----------+
| cc_call_center_sk  | int        |          |
| cc_call_center_id  | string     |          |
| cc_rec_start_date  | string     |          |
| cc_rec_end_date    | string     |          |
| cc_closed_date_sk  | int        |          |
| cc_open_date_sk    | int        |          |
| cc_name            | string     |          |
| cc_class           | string     |          |
| cc_employees       | int        |          |
| cc_sq_ft           | int        |          |
| cc_hours           | string     |          |
| cc_manager         | string     |          |
| cc_mkt_id          | int        |          |
| cc_mkt_class       | string     |          |
| cc_mkt_desc        | string     |          |
| cc_market_manager  | string     |          |
| cc_division        | int        |          |
| cc_division_name   | string     |          |
| cc_company         | int        |          |
| cc_company_name    | string     |          |
| cc_street_number   | string     |          |
| cc_street_name     | string     |          |
| cc_street_type     | string     |          |
| cc_suite_number    | string     |          |
| cc_city            | string     |          |
| cc_county          | string     |          |
| cc_state           | string     |          |
| cc_zip             | string     |          |
| cc_country         | string     |          |
| cc_gmt_offset      | float      |          |
| cc_tax_percentage  | float      |          |
+--------------------+------------+----------+

# 查看表的详细信息(格式不友好)

desc extended call_center;

# 查看表的详细信息(格式友好)包含文件数 分区数等等,查看表的文件个数(用于分析小文件)

desc formatted call_center;

+-------------------------------+----------------------------------------------------+----------------------------------------------------+
|           col_name            |                     data_type                      |                      comment                       |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
| # col_name                    | data_type                                          | comment                                            |
| cc_call_center_sk             | int                                                |                                                    |
| cc_call_center_id             | string                                             |                                                    |
| cc_rec_start_date             | string                                             |                                                    |
| cc_rec_end_date               | string                                             |                                                    |
| cc_closed_date_sk             | int                                                |                                                    |
| cc_open_date_sk               | int                                                |                                                    |
| cc_name                       | string                                             |                                                    |
| cc_class                      | string                                             |                                                    |
| cc_employees                  | int                                                |                                                    |
| cc_sq_ft                      | int                                                |                                                    |
| cc_hours                      | string                                             |                                                    |
| cc_manager                    | string                                             |                                                    |
| cc_mkt_id                     | int                                                |                                                    |
| cc_mkt_class                  | string                                             |                                                    |
| cc_mkt_desc                   | string                                             |                                                    |
| cc_market_manager             | string                                             |                                                    |
| cc_division                   | int                                                |                                                    |
| cc_division_name              | string                                             |                                                    |
| cc_company                    | int                                                |                                                    |
| cc_company_name               | string                                             |                                                    |
| cc_street_number              | string                                             |                                                    |
| cc_street_name                | string                                             |                                                    |
| cc_street_type                | string                                             |                                                    |
| cc_suite_number               | string                                             |                                                    |
| cc_city                       | string                                             |                                                    |
| cc_county                     | string                                             |                                                    |
| cc_state                      | string                                             |                                                    |
| cc_zip                        | string                                             |                                                    |
| cc_country                    | string                                             |                                                    |
| cc_gmt_offset                 | float                                              |                                                    |
| cc_tax_percentage             | float                                              |                                                    |
|                               | NULL                                               | NULL                                               |
| # Detailed Table Information  | NULL                                               | NULL                                               |
| Database:                     | tpcds_orc_hive_8000                                | NULL                                               |
| OwnerType:                    | USER                                               | NULL                                               |
| Owner:                        | hdfs                                               | NULL                                               |
| CreateTime:                   | Wed Dec 09 03:17:21 CST 2020                       | NULL                                               |
| LastAccessTime:               | UNKNOWN                                            | NULL                                               |
| Retention:                    | 0                                                  | NULL                                               |
| Location:                     | hdfs://server1:8020/warehouse/tablespace/managed/hive/tpcds_orc_hive_8000.db/call_center | NULL                                               |
| Table Type:                   | MANAGED_TABLE                                      | NULL                                               |
| Table Parameters:             | NULL                                               | NULL                                               |
|                               | COLUMN_STATS_ACCURATE                              | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"cc_call_center_id\":\"true\",\"cc_call_center_sk\":\"true\",\"cc_city\":\"true\",\"cc_class\":\"true\",\"cc_closed_date_sk\":\"true\",\"cc_company\":\"true\",\"cc_company_name\":\"true\",\"cc_country\":\"true\",\"cc_county\":\"true\",\"cc_division\":\"true\",\"cc_division_name\":\"true\",\"cc_employees\":\"true\",\"cc_gmt_offset\":\"true\",\"cc_hours\":\"true\",\"cc_manager\":\"true\",\"cc_market_manager\":\"true\",\"cc_mkt_class\":\"true\",\"cc_mkt_desc\":\"true\",\"cc_mkt_id\":\"true\",\"cc_name\":\"true\",\"cc_open_date_sk\":\"true\",\"cc_rec_end_date\":\"true\",\"cc_rec_start_date\":\"true\",\"cc_sq_ft\":\"true\",\"cc_state\":\"true\",\"cc_street_name\":\"true\",\"cc_street_number\":\"true\",\"cc_street_type\":\"true\",\"cc_suite_number\":\"true\",\"cc_tax_percentage\":\"true\",\"cc_zip\":\"true\"}} |
|                               | bucketing_version                                  | 2                                                  |
|                               | numFiles                                           | 1                                                  |
|                               | numRows                                            | 10                                                 |
|                               | rawDataSize                                        | 0                                                  |
|                               | totalSize                                          | 5899                                               |
|                               | transactional                                      | true                                               |
|                               | transactional_properties                           | default                                            |
|                               | transient_lastDdlTime                              | 1607455918                                         |
|                               | NULL                                               | NULL                                               |
| # Storage Information         | NULL                                               | NULL                                               |
| SerDe Library:                | org.apache.hadoop.hive.ql.io.orc.OrcSerde          | NULL                                               |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat    | NULL                                               |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat   | NULL                                               |
| Compressed:                   | No                                                 | NULL                                               |
| Num Buckets:                  | -1                                                 | NULL                                               |
| Bucket Columns:               | []                                                 | NULL                                               |
| Sort Columns:                 | []                                                 | NULL                                               |
| Storage Desc Params:          | NULL                                               | NULL                                               |
|                               | serialization.format                               | 1                                                  |
+-------------------------------+----------------------------------------------------+----------------------------------------------------+

# 查看分区信息

show partitions time_dim;

(3)查看建表语句

# 可以看到建表语句的详细信息

show create table call_center;

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE TABLE `call_center`(                        |
|   `cc_call_center_sk` int,                         |
|   `cc_call_center_id` string,                      |
|   `cc_rec_start_date` string,                      |
|   `cc_rec_end_date` string,                        |
|   `cc_closed_date_sk` int,                         |
|   `cc_open_date_sk` int,                           |
|   `cc_name` string,                                |
|   `cc_class` string,                               |
|   `cc_employees` int,                              |
|   `cc_sq_ft` int,                                  |
|   `cc_hours` string,                               |
|   `cc_manager` string,                             |
|   `cc_mkt_id` int,                                 |
|   `cc_mkt_class` string,                           |
|   `cc_mkt_desc` string,                            |
|   `cc_market_manager` string,                      |
|   `cc_division` int,                               |
|   `cc_division_name` string,                       |
|   `cc_company` int,                                |
|   `cc_company_name` string,                        |
|   `cc_street_number` string,                       |
|   `cc_street_name` string,                         |
|   `cc_street_type` string,                         |
|   `cc_suite_number` string,                        |
|   `cc_city` string,                                |
|   `cc_county` string,                              |
|   `cc_state` string,                               |
|   `cc_zip` string,                                 |
|   `cc_country` string,                             |
|   `cc_gmt_offset` float,                           |
|   `cc_tax_percentage` float)                       |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'      |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION                                           |
|   'hdfs://server1:8020/warehouse/tablespace/managed/hive/tpcds_orc_hive_8000.db/call_center' |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'transactional'='true',                          |
|   'transactional_properties'='default',            |
|   'transient_lastDdlTime'='1607455918')            |
+----------------------------------------------------+

 

7.设置参数

(1)查看当前参数

set hive.auto.convert.join;

+------------------------------+
| set                          |
+------------------------------+
| hive.auto.convert.join=true  |
+------------------------------+

(2)设置参数

set hive.auto.convert.join = false;

8.查看sql的执行计划

explain select * from test_table;

# 查看详细的执行计划(信息会很多)

explain extended select * from test_table;

猜你喜欢

转载自blog.csdn.net/qq_35260875/article/details/110957395