hive DDL operations

1. Create a database

CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value,...)];

(1) create a database, the default database storage path on hdfs is /user/hive/warehouse/*.db

hive(default)> create database db_hive;

(2) prevent the database to be created already exists errors, increase if not exists judge. (Standard wording)

hive(default)> create database if not exists db_hive;

(3) create a database, specify the location database stored on HDFS

create database db location '/db_hive.db'

2, display database

(1) Display Database

show databases;

(2) Filter display

show databases like 'db_*';

(3) Display Information Database

show database db_hive;

(4) display database information, details extended

desc database extended db_hive;

3, modify the database

Users can alter database command to set the property value KV dbproperties a database, the attribute information described in this database.

数据库的其他元数据信息都是不可变的,包括库名和数据库所在的目录位置

alter database db_hive set dbproperties('createtime'='2019'); 

4, delete the database

(1) remove empty database

drop database [if not exists] db_hive;

(2) if not empty, plus cascade forced to delete

drop database db_hive cascade;

5. Create a table

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)] 
[COMMENT table_comment] 
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) ]
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format]
[STORED AS file_format] 
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

1) Field Description

(1) CREATE TABLE creates a table name is specified. If the same table name already exists, an exception is thrown; the user can use IF NOT EXISTS option to ignore this exception.

(2) EXTERNAL keyword allows the user to create an external table, while the construction of the table can specify a path to the actual data points (LOCATION)

在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据

(3) COMMENT: add comments to tables and columns.

(4) PARTITIONED BY create a partition table

(5) CLUSTERED BY create a sub-bucket table

(6) SORTED BY not used, to one or more of the tub additionally sort column

(7)ROW FORMAT

  • DELIMITED [FIELDS TERMINATED BY char]

  • [COLLECTION ITEMS TERMINATED BY char]

  • [MAP KEYS TERMINATED BY char]

  • [LINES TERMINATED BY char]

    SERDE serde_name [WITH SERDEPROPERTIES(property_name=property_value, property_name=property_value, ...)]

    When users can build custom SerDe table or use the built-in SerDe.

    If you do not specify ROW FORMAT or ROW FORMAT DELIMITED, will use its own SerDe.

    Hive通过SerDe确定表的具体的列的数据。

    SerDe is Serialize / Deserilize short, hive sequence for use Serde deserialized row objects.

(8) STORED AS designated storage file type

Common store file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (columnar storage format)

If the data file is plain text, you can use STORED AS TEXTFILE. If you need to compress data using STORED AS SEQUENCEFILE.

(9) LOCATION: Specifies the location table is stored in the HDFS.

(10) AS: followed by the query, create a table based on query results.

(11) LIKE allows the user to copy an existing table structure, but does not copy the data.

2) management table (inner table)

Tables are created by default so-called management table, sometimes also referred to as an internal table. Because of this table, Hive will be (more or less) controls the data life cycle. Hive default data will be stored in the case these tables by the configuration item hive.metastore.warehouse.dir(e.g., /user/hive/warehouse) under the subdirectory of the directory defined.

当我们删除一个管理表时,Hive也会删除这个表中数据. Management is not suitable for table and other tools to share data.

(1) Create a general table

create table if not exists student1(
id int, name string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/student1';

(2) create a table based on query results (the results of the query will be added to the new table)

create table if not exists student2 as select id ,name from student1;

(3) The existing table structure to create a table (not copy data)

create table if not exists student3 like student;

(4) look-up table structure

desc formatted student;

3) external table

Because the table is the outer table, so the hive not give their full possession of this data.

删除该表并不会删除掉这份数据,不过描述表的元数据信息会被删除

1, using the management table and the outer table scenes

The collected daily website logs regularly flow into HDFS text file. On the basis of doing the external table (original log table) on a large number of statistical analysis used in the intermediate table, the results using internal tables stored in the table, data table SELECT + INSERT into the interior.

Conversion 4) management table and the outer table

alter table student set tblproperties('EXTERNAL'='TRUE');
alter table student set tblproperties('EXTERNAL'='FASLE');

note:

('EXTERNAL'='TRUE')和('EXTERNAL'='FALSE')为固定写法,区分大小写

6, partition table

Partition table is actually corresponding to a separate file on the file system hdfs folder, the folder is the partition of all data files.

hive中的分区就是分目录

The need for a large data set into smaller data sets according to the service. When a query query selects the specified partition required by WHERE clause expression, such query efficiency will improve a lot.

Basic operations 1), the partition table

(1) to create a partition table

create table hive_partition(
id int,name string
)
partitioned by (class string)
row format delimited fields terminated by '\t';

分区字段不能是表中已经存在的字段数据,可以将分区字段看做表的伪列

(2) loading data into a partition

load data local inpath '/opt/xxx' into table hive_parition partition(class='1')
load data local inpath '/opt/xxx' into table hive_parition partition(class='2')
load data local inpath '/opt/xxx' into table hive_parition partition(class='3')

分区表加载数据时必须指定分区

The data (3) query the partition table

select * from hive_partition where class='1'
# 没有partition

(4) multi-partition joint inquiry

select * from hive_partition where class='1'
union
select * from hive_partition where class='2'

(5) Add District

alter table hive_partition add partition(class='4');
alter table hive_partition add partition(class='4') partition(class='5');

增加多个分区时,partition之间不用逗号“,”隔开

(6) Delete Partition

alter table hive_partition drop partition(class='4');
alter table hive_partition add partition(class='4'), partition(class='5');

要用逗号隔开

(7) to see how many partitions in the partition table

show partitions hive_partition;

(8) view the partition structure

desc formatted hive_partition;
2) Partition Notes

(1) Create two partition table

create table hive_partition(
id int, name string
)
partitioned by (year string,month string)
row format delimited fields terminated by '\t';

(2) load data into the secondary partition

load data local inpath '/datas/xxx' into table hive_partition partition(year='2019',month='2');

(3) partitioned data queries

select * from hive_partition where year='2019' and month='2';
3) data directly uploaded to the directory partition, so that the partition table and the associated data generated in three ways

upload data:

dfs -mkdir -p /hive_partition/year=2019/month=2 ;
dfs -put  /opt/xxx.txt  /hive_partition/year=2019/month=2 ;

(1) Method 1: Upload Data Recovery

msck repair table hive_partition;

(2) Second way: After uploading to add a partition

alter table hive_partition add partition(year='2019',month='2');

(3) Three ways: After you create a folder to load data partition

load data local inpath /opt/xxx.txt into table hive_partition partition(year='2019',month='2');

7, modify the table

(1) Rename table

alter table hive_partition rename to test; 

(2) additions, modifications, substitutions column

Update

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

Additions and substitutions column

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

note:

add是代表新增一字段,字段位置在所有列后面(partition列前)

replace则表示替换表中所有字段

Guess you like

Origin www.cnblogs.com/hyunbar/p/11730564.html