Hive (4) Hive database definition DDL language

4. DDL data definition

4.1 Create a database

  • Create a database. The default storage path of the database on HDFS is /opt/hive/warehouse/*.db
create database hivetest;
  • To avoid errors in the database to be created, add if not exists judgment.(标准写法)
create database if not exists hivetest;

image-20200916163107048

  • Create a database and specify the location where the database is stored on HDFS
create database if not exists hivetest location 'hdfs路径';

image-20200916163354544

4.2 Query the database

  • Show database
show databases;

image-20200916165802686

​ Filter and display the query database

show databases like 'hivetest*';

image-20200916165936990

  • View database details
desc database hivetest;

image-20200916170340247

  • Switch current database
use 目标数据库名称;

4.3 Delete the database

  • Delete empty database
drop database 库名;
  • If the deleted database does not exist, it is best to use if exists to determine whether the database exists
drop database if exists 库名;
  • If the database is not empty, you can use the cascade command to force deletion
drop database 库名 cascade;

4.4 Create Table

  • Table building syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
[(col_name data_type [COMMENT col_comment], ...)] 
[COMMENT table_comment] 
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
[CLUSTERED BY (col_name, col_name, ...) 
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format] 
[STORED AS file_format] 
[LOCATION hdfs_path]
  • Field explanation

(1) CREATE TABLE creates a table with a specified name. If a table with the same name already exists, an exception is thrown; the user can use the IF NOT EXISTS option to ignore this exception.

(2) The EXTERNAL keyword allows users to create an external table, and specify a path to the actual data (LOCATION) while building the table Hive创建内部表时,会将数据移动到数据仓库指向的路径; if an external table is created, only the path where the data is located is recorded, without any changes to the location of the data . When a table is deleted, the metadata and data of the internal table will be deleted together, while the external table only deletes the metadata, not the data.

(3) COMMENT: Add notes to tables and columns.

(4) PARTITIONED BY to create a partition table

(5) CLUSTERED BY creates a bucket table

(6) SORTED BY is not commonly used

(7)ROW FORMAT

DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]

​ [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, …)]

Users can customize the SerDe or use the built-in SerDe when building the table. If ROW FORMAT or ROW FORMAT DELIMITED is not specified, the built-in SerDe will be used. When creating a table, the user also needs to specify columns for the table. When specifying the columns of the table, the user will also specify a custom SerDe. Hive determines the specific column data of the table through SerDe.

SerDe is the abbreviation of Serialize/Deserilize, which is used for serialization and deserialization.

(8) STORED AS designated storage file type

Commonly used storage file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (column storage format file)

If the file data is plain text, you can use STORED AS TEXTFILE. If the data needs to be compressed, use STORED AS SEQUENCEFILE.

(9)LOCATION :指定表在HDFS上的存储位置。

(10) LIKE allows users to copy the existing table structure, but does not copy data.

4.4.1 Internal table

The tables created by default are so-called management tables, sometimes called internal tables. Because of this kind of table, Hive will (more or less) control the life cycle of the data. By default, Hive stores the data of these tables in a subdirectory of the directory defined by the configuration item hive.metastore.warehouse.dir (for example, /opt/hive/warehouse). 当我们删除一个管理表时,Hive也会删除这个表中数据。Management tables are not suitable for sharing data with other tools.

  • Ordinary create table
create table if not exists student2(
id int, name string
)
row format delimited fields terminated by '\t';
  • Create a table based on the query result (the query result will be added to the newly created table)
create table if not exists student3 as select id, name from student;
  • Create a table based on an existing table structure
create table if not exists student4 like student;
  • Type of lookup table
desc formatted student2;

4.4.2 External table

Because the table is an external table, Hive does not think it completely owns this data.删除该表并不会删除掉这份数据,不过描述表的元数据信息会被删除掉。

  • Use scenarios for management tables and external tables

The collected website logs are regularly streamed into HDFS text files every day. Do a lot of statistical analysis on the basis of the external table (original log table). The intermediate table and result table used are stored in the internal table, and the data enters the internal table through SELECT+INSERT.

Detailed case

Create employee external tables, and import data into the tables.

Michael|Montreal,Toronto|Male,30|DB:80|Product:DeveloperLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89|Sales:Lead
  • Table building statement

Create employee table

create external table if not exists employee(
name string,
address array<string>,
personalInfo array<string>,
technol map<string,int>,
jobs map<string,string>)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

Import data into an external table

load data local inpath '/root/employee.txt' into table employee;

search result

select * from employee;

image-20200916201159535

4.4.3 Mutual conversion between management tables and external tables

  • Modify the internal table student2 as an external table
alter table student2 set tblproperties('EXTERNAL'='TRUE');
  • Modify the external table student2 as an internal table
alter table student2 set tblproperties('EXTERNAL'='FALSE');

注意:('EXTERNAL'='TRUE')和('EXTERNAL'='FALSE')为固定写法,区分大小写!

4.5 partition table (partition)

The partition table actually corresponds to an independent folder on the HDFS file system, and all the data files of the partition are under this folder. Hive中的分区就是分目录, Divide a large data set into small data sets according to business needs. When querying, select the specified partition required by the query through the expression in the WHERE clause. This query efficiency will improve a lot.

4.5.1 Basic operation of partition table

data

10,ACCOUNTING,NEW YORK
10,ACCOUNTING,NEW YORK
10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
20,RESEARCH,DALLAS
20,RESEARCH,DALLAS
30,SALES,CHICAGO
30,SALES,CHICAGO

1. Introduce a partition table (the log needs to be managed according to the date)

/opt/hive/warehouse/log_partition/20170702/20170702.log
/opt/hive/warehouse/log_partition/20170703/20170703.log
/opt/hive/warehouse/log_partition/20170704/20170704.log

2. Create partition table syntax

create table dept_partition(
deptno int, dname string, loc string
)
partitioned by (month string)
row format delimited fields terminated by ',';

3. Load data into the partition table

load data local inpath '/opt/dept.txt' into table default.dept_partition partition(month='201707’);
load data local inpath '/opt/dept.txt' into table default.dept_partition partition(month='201708);
load data local inpath '/opt/dept.txt' into table default.dept_partition partition(month='201709);

image-20200916231329604

image-20200916230900306

image-20200916230917167

4. Query the data in the partition table

Single partition query

select * from dept_partition where month='201709';

image-20200916231504421

Multi-partition joint query

select * from dept_partition where month='201709'
union
select * from dept_partition where month='201708'
union
select * from dept_partition where month='201707';

note

Versions before Hive 1.2.0 only support UNION ALL, in which duplicate rows will not be deleted.

In Hive 1.2.0 and later, the default behavior of UNION is to remove duplicate rows from the results.

5. Increase partition

alter table dept_partition add partition(month='201706') ;
alter table dept_partition add partition(month='201705') ,partition(month='201704');

6. Delete partition

alter table dept_partition drop partition (month='201704');
alter table dept_partition drop partition (month='201705'), partition (month='201706')

7. View how many partitions the partition table has

show partitions dept_partition;

8. View the partition table structure

desc formatted dept_partition;

4.6 Modify table

4.6.1 Rename table

  • grammar
ALTER TABLE table_name RENAME TO new_table_name
  • Instance
alter table dept_partition2 rename to dept_partition3;

4.6.2 Add/modify/replace column information

  • grammar

Update column

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

Add and replace columns

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) 

Note: ADD means adding a new field, the field position is behind all columns (before the partition column), REPLACE means replacing all fields in the table.

  • Case

Add column

alter table dept_partition add columns(deptdesc string);

Update column

alter table dept_partition change column deptdesc desc int;

Replace column

alter table dept_partition replace columns(deptno string, dname string, loc string);

image-20200917192015967

image-20200917192028686

4.6.3 Delete table

drop table dept_partition;

Note: The external table cannot be deleted simply by this command. This command can only delete the metadata of the external table. There is no way to delete the data on the hdfs. If you need to delete the external table completely, there are the following methods:

  • Option 1: Convert to internal table and delete
ALTER TABLE xxx SET TBLPROPERTIES('EXTERNAL'='False');

drop table xxx;
  • Option 2: Delete metadata, and then use hdfs to delete data

Guess you like

Origin blog.csdn.net/zmzdmx/article/details/108651190