Hive_ create a table

1. To build the table syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
[(col_name data_type [COMMENT col_comment], ...)] 
[COMMENT table_comment] 
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
[CLUSTERED BY (col_name, col_name, ...) 
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format] 
[STORED AS file_format] 
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

2. Field explanations

(1) CREATE TABLE creates a table name is specified. If the same table name already exists, an exception is thrown; the user can use IF NOT EXISTS option to ignore this exception.

(2) EXTERNAL keyword allows the user to create an external table, while construction of the table you can specify a path to the actual data points (LOCATION), delete the table, the metadata for internal tables and data will be deleted together, and external table to delete only the metadata, do not delete the data.

(3) COMMENT: add comments to tables and columns.

(4) PARTITIONED BY create a partition table

(5) CLUSTERED BY create a sub-bucket table

(6) SORTED BY not used, to one or more of the tub additionally sort column

(7)ROW FORMAT

DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] 
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

When users can build custom SerDe table or use the built-in SerDe. If you do not specify ROW FORMAT or ROW FORMAT DELIMITED, will use its own SerDe. In the construction of the table, the user also needs to specify the column of the table, while users listed in the specified table also specify custom SerDe, Hive table determined by a particular data string SerDe.

SerDe is Serialize / Deserilize short, hive sequence for use Serde deserialized row objects.

(8) STORED AS designated storage file type

Common store file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (columnar storage format)

If the data file is plain text, you can use STORED AS TEXTFILE. If you need to compress data using STORED AS SEQUENCEFILE.

(9) LOCATION: Specifies the location table is stored in the HDFS.

(10) AS: followed by the query, create a table based on query results.

(11) LIKE allows the user to copy an existing table structure, but does not copy the data.

Management table

1. theory

Tables are created by default so-called management table, sometimes also referred to as an internal table. Because of this table, Hive will be (more or less) controls the data life cycle. Hive these tables will default data stored in the subdirectory of a configuration item hive.metastore.warehouse.dir (e.g., / user / hive / warehouse) as defined above. When we remove a management table, Hive will also delete the data in the table. Management is not suitable for table and other tools to share data.

2. Case practical operation

(1) Create a general table

create table if not exists student2(
id int, name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/student2';

(2) create a table based on query results (the results of the query will be added to the table in the newly created)

create table if not exists student3 as select id, name from student;

(3) Create a table based on an existing table structure

create table if not exists student4 like student;

Type (4) look-up table

hive (default)> desc formatted student2;
Table Type:             MANAGED_TABLE  

External table

1. theory

Because the table is the outer table, so that its wholly owned Hive is not this data. Delete the table does not delete this data, but the metadata description of the information will be deleted.

2. Management and outer tables of usage scenarios

The collected daily website logs regularly flow into HDFS text file. On the basis of doing the external table (original log table) on a large number of statistical analysis used in the intermediate table, the results using internal tables stored in the table, data table SELECT + INSERT into the interior.

3. Case practical operation

Creating departments and employees are outside tables, and import the data in the table.

(1) uploads the data to HDFS

hive (default)> dfs -mkdir /student;
hive (default)> dfs -put /opt/module/datas/student.txt /student;

(2) construction of the table statement

hive (default)> create external table stu (
id int, 
name string) 
row format delimited fields terminated by '\t' 
location '/student';

(3) See table created

hive (default)> select * from stu_external;
OK
stu_external.id stu_external.name
1001    lisi
1002    wangwu
1003    zhaoliu

(4) Display formatted data

hive (default)> desc formatted dept;
Table Type:             EXTERNAL_TABLE

(5) delete the external table

hive (default)> drop table stu_external;

After the external table to delete data in hdfs still there, but the metadata metadata in stu_external has been deleted

Management table and the outer table interchangeable

Type (1) lookup table

hive (default)> desc formatted student2;
Table Type:             MANAGED_TABLE

(2) modify the internal tables outer table student2

alter table student2 set tblproperties('EXTERNAL'='TRUE');

Type (3) look-up table

hive (default)> desc formatted student2;
Table Type:             EXTERNAL_TABLE

(4) modify the internal tables outer table student2

alter table student2 set tblproperties('EXTERNAL'='FALSE');

Type (5) look-up table

hive (default)> desc formatted student2;
Table Type:             MANAGED_TABLE

Note :( 'EXTERNAL' = 'TRUE') and ( 'EXTERNAL' = 'FALSE') fixed wording, case sensitive!

Guess you like

Origin www.cnblogs.com/Tunan-Ki/p/11795772.html