hive create/drop/truncate table (translated from Hive wiki)

Common operations are listed here, for more reference  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDrop%2FTruncateTable

 

simple table creation

create table table_name (
  id                int,
  dtDontQuery       string,
  name              string
)

 

 

Create a partitioned table

create table table_name (
  id                int,
  dtDontQuery       string,
  name              string
)
partitioned by (date string)

A table can have one or more partitions, and each partition exists in the form of a folder under the directory of the table folder.

Partitions exist in the table structure in the form of fields. You can view the existence of fields through the describe table command, but this field does not store the actual data content, only the representation of the partition.

In a Hive Select query, the entire table content is generally scanned, which consumes a lot of time to do unnecessary work. Sometimes only a part of the data of interest in the table needs to be scanned, so the concept of partition is introduced when creating the table. A Partition in the table corresponds to a directory under the table. Partition is an auxiliary query, which narrows the query scope, speeds up data retrieval and manages data according to certain specifications and conditions.

 

Typical default creation table

copy code
CREATE TABLE page_view(
     viewTime INT,
     userid BIGINT,
     page_url STRING,
     referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(dt STRING, country STRING)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\001'
   COLLECTION ITEMS TERMINATED BY '\002'
   MAP KEYS TERMINATED BY '\003'
 STORED AS TEXTFILE;
copy code

 

The table page_view is created here, with a comment for the table, a comment for a field ip, and the partition has two columns, dt and country.

The [ROW FORMAT DELIMITED] keyword is used to set the column separator that the created table supports when loading data. Different columns are separated by a '\001', elements of sets (such as array, map) are separated by '\002', and the key and value in the map are separated by '\003'.

 

The [STORED AS file_format] keyword is used to set the data type of the loaded data. The default is TEXTFILE. If the file data is plain text, use [STORED AS TEXTFILE], and then copy it directly from the local to HDFS, hive can directly identify the data .

 

Commonly used table creation

copy code
CREATE TABLE login(
     userid BIGINT,
     ip STRING,
     time BIGINT)
 PARTITIONED BY(dt STRING)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE;
copy code

 

Create external table

If the data already exists in '/user/hadoop/warehouse/page_view' of HDFS, if you want to create a table and point to this path, you need to create an external table:

copy code
CREATE EXTERNAL TABLE page_view(
     viewTime INT,
     userid BIGINT,
     page_url STRING,
     referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
 COMMENT 'This is the staging page view table'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS TEXTFILE
 LOCATION '/user/hadoop/warehouse/page_view';
copy code

Create a table. If EXTERNAL is specified, it is an external table. If it is not specified, it is an internal table. The internal table will delete data from HDFS when it is dropped, but the external table will not be deleted.

Like internal tables, external tables can have partitions. If partitions are specified, after the external table is built, the table must be modified to add partitions.

If the external table has partitions, it can also load data and overwrite the partition data. However, if the external table deletes the partition, the data of the corresponding partition will not be deleted from HDFS, while the internal table will delete the partition data.

 

Specify the database to create a table

If you do not specify a database, hive will create the table under the default database, assuming there is a hive database mydb, to create a table to mydb, as follows:

CREATE TABLE mydb.pokes(foo INT,bar STRING);

or

use mydb; -- point the current database to mydb
CREATE TABLE pokes(foo INT,bar STRING);

 

Copy table structure

CREATE TABLE empty_table_name LIKE table_name;

Create an empty table empty_table_name based on table_name, empty_table_name has no data.

 

create-table-as-selectt (CTAS)

The table created by CTAS is atomic, which means that other users cannot see the complete query result table until all the query results of the table are completed.

The only limitation of CTAS is that the target table cannot be a partitioned table, nor can it be a foreign table.

easy way

CREATE TABLE new_key_value_store
  AS 
SELECT (key % 1024) new_key, concat(key, value) key_value_pair FROM key_value_store;

complicated way

CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;

 

delete table

DROP TABLE table_name;
DROP TABLE IF EXISTS table_name;

Deleting a table will remove the metadata and data of the table, and the data on HDFS, if Trash is configured, will be moved to the .Trash/Current directory.

When an external table is dropped, the data in the table is not deleted.

 

truncate table

TRUNCATE TABLE table_name;
TRUNCATE TABLE table_name PARTITION (dt='20080808');

Delete all rows from a table or table partition. If no partition is specified, all partitions in the table will be truncated. You can also specify multiple partitions at a time to truncate multiple partitions.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324972057&siteId=291194637