1. To build the table syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] [AS select_statement]
2. Field explanations
(1) CREATE TABLE creates a table name is specified. If the same table name already exists, an exception is thrown; the user can use IF NOT EXISTS option to ignore this exception.
(2) EXTERNAL keyword allows the user to create an external table, while construction of the table you can specify a path to the actual data points (LOCATION), delete the table, the metadata for internal tables and data will be deleted together, and external table to delete only the metadata, do not delete the data.
(3) COMMENT: add comments to tables and columns.
(4) PARTITIONED BY create a partition table
(5) CLUSTERED BY create a sub-bucket table
(6) SORTED BY not used, to one or more of the tub additionally sort column
(7)ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
When users can build custom SerDe table or use the built-in SerDe. If you do not specify ROW FORMAT or ROW FORMAT DELIMITED, will use its own SerDe. In the construction of the table, the user also needs to specify the column of the table, while users listed in the specified table also specify custom SerDe, Hive table determined by a particular data string SerDe.
SerDe is Serialize / Deserilize short, hive sequence for use Serde deserialized row objects.
(8) STORED AS designated storage file type
Common store file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (columnar storage format)
If the data file is plain text, you can use STORED AS TEXTFILE. If you need to compress data using STORED AS SEQUENCEFILE.
(9) LOCATION: Specifies the location table is stored in the HDFS.
(10) AS: followed by the query, create a table based on query results.
(11) LIKE allows the user to copy an existing table structure, but does not copy the data.
Management table
1. theory
Tables are created by default so-called management table, sometimes also referred to as an internal table. Because of this table, Hive will be (more or less) controls the data life cycle. Hive these tables will default data stored in the subdirectory of a configuration item hive.metastore.warehouse.dir (e.g., / user / hive / warehouse) as defined above. When we remove a management table, Hive will also delete the data in the table. Management is not suitable for table and other tools to share data.
2. Case practical operation
(1) Create a general table
create table if not exists student2( id int, name string ) row format delimited fields terminated by '\t' stored as textfile location '/user/hive/warehouse/student2';
(2) create a table based on query results (the results of the query will be added to the table in the newly created)
create table if not exists student3 as select id, name from student;
(3) Create a table based on an existing table structure
create table if not exists student4 like student;
Type (4) look-up table
hive (default)> desc formatted student2; Table Type: MANAGED_TABLE
External table
1. theory
Because the table is the outer table, so that its wholly owned Hive is not this data. Delete the table does not delete this data, but the metadata description of the information will be deleted.
2. Management and outer tables of usage scenarios
The collected daily website logs regularly flow into HDFS text file. On the basis of doing the external table (original log table) on a large number of statistical analysis used in the intermediate table, the results using internal tables stored in the table, data table SELECT + INSERT into the interior.
3. Case practical operation
Creating departments and employees are outside tables, and import the data in the table.
(1) uploads the data to HDFS
hive (default)> dfs -mkdir /student; hive (default)> dfs -put /opt/module/datas/student.txt /student;
(2) construction of the table statement
hive (default)> create external table stu ( id int, name string) row format delimited fields terminated by '\t' location '/student';
(3) See table created
hive (default)> select * from stu_external; OK stu_external.id stu_external.name 1001 lisi 1002 wangwu 1003 zhaoliu
(4) Display formatted data
hive (default)> desc formatted dept; Table Type: EXTERNAL_TABLE
(5) delete the external table
hive (default)> drop table stu_external;
After the external table to delete data in hdfs still there, but the metadata metadata in stu_external has been deleted
Management table and the outer table interchangeable
Type (1) lookup table
hive (default)> desc formatted student2; Table Type: MANAGED_TABLE
(2) modify the internal tables outer table student2
alter table student2 set tblproperties('EXTERNAL'='TRUE');
Type (3) look-up table
hive (default)> desc formatted student2; Table Type: EXTERNAL_TABLE
(4) modify the internal tables outer table student2
alter table student2 set tblproperties('EXTERNAL'='FALSE');
Type (5) look-up table
hive (default)> desc formatted student2; Table Type: MANAGED_TABLE
Note :( 'EXTERNAL' = 'TRUE') and ( 'EXTERNAL' = 'FALSE') fixed wording, case sensitive!