Hive - DDL Data Definition

1. Create a database

  1) create a database, the default database storage path on HDFS is /user/hive/warehouse/*.db.

hive > create database db_hive;

  2) Avoid the database to be created already exists errors, increase if not exists judge. ( Standard wording )

hive > create database db_hive;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists

hive > create database if not exists db_hive;

  3) create a database, specify the location database stored on HDFS

hive > create database db_hive2 location '/db_hive2.db';

2, query the database

2.1 Display Database

1. Display Database

hive> show databases;

2. Filter the display of database query

hive> show databases like 'db_hive*';

OK

db_hive

db_hive_1

2.2 View database details

1. Display database information

hive> desc database db_hive;

OK

db_hive hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db atguiguUSER

2. Display database details, extended

hive> desc database extended db_hive;

OK

db_hive hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db atguiguUSER

40.3.3 切换当前数据库

hive (default)> use db_hive;

2 .3 switching current database

hive > use db_hive;

3, modify the database

        Users can use the ALTER DATABASE command to set the key DBPROPERTIES a database - the value of an attribute value, attribute information described in this database. Other metadata information database is unalterable, including the name and location of the database directory database is located. Modify the database currently in use, quit using

hive (default)> alter database db_hive set dbproperties('createtime'='20170830');

In the hive see the changes result in

hive> desc database extended db_hive;

db_name comment location owner_name owner_type parameters

db_hive hdfs://master:8020/user/hive/warehouse/db_hive.db root USER {createtime=20170830}

4, delete the database

1. Delete empty database

hive>drop database db_hive2;

2. If you delete the database does not exist, the best use of the database to determine whether there is if exists

hive> drop database db_hive;

FAILED: SemanticException [Error 10072]: Database does not exist: db_hive

hive> drop database if exists db_hive2;

3. If the database is not empty, you can use cascade command, forced to delete

hive> drop database db_hive;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)

hive> drop database db_hive cascade;

5. Create a table

1. To build the table syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

[(col_name data_type [COMMENT col_comment], ...)]

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

[CLUSTERED BY (col_name, col_name, ...)

[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

[ROW FORMAT row_format]

[STORED AS file_format]

[LOCATION hdfs_path]

2. Field explanations

(1) CREATE TABLE creates a table name is specified. If the same table name already exists, an exception is thrown; the user can use IF NOT EXISTS option to ignore this exception.

(2) EXTERNAL keyword allows the user to create an external table, while construction of the table to specify a path to the actual data points (LOCATION), when Hive create an internal table, will move to the path of the data warehouse data points; if you create an external table, only records where the data path, the location does not make any changes to the data . Delete table, the metadata of internal tables and data will be deleted together, and external table only remove metadata, do not delete the data.

(3) COMMENT: add comments to tables and columns.

(4) PARTITIONED BY create a partition table

(5) CLUSTERED BY create a sub-bucket table

(6) SORTED BY not used

(7)ROW FORMAT

DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]

        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

When users can build custom SerDe table or use the built-in SerDe. If you do not specify ROW FORMAT or ROW FORMAT DELIMITED, will use its own SerDe. In the construction of the table, the user also needs to specify the column of the table, while users listed in the specified table also specify custom SerDe, Hive table determined by a particular data string SerDe.

SerDe is Serialize / Deserilize short, intended for serialization and deserialization.

(8) STORED AS designated storage file type

Common store file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (columnar storage format)

If the data file is plain text, you can use STORED AS TEXTFILE. If you need to compress data using STORED AS SEQUENCEFILE.

(9) LOCATION: Specifies the location table is stored in the HDFS.

(10) LIKE allows the user to copy an existing table structure, but does not copy the data.

5.1 management table

1. theory

       Tables are created by default so-called management table, sometimes also referred to as an internal table . Because of this table, Hive will be (more or less) controls the data life cycle. Hive these tables will default data stored in the subdirectory of a configuration item hive.metastore.warehouse.dir (e.g., / user / hive / warehouse) as defined above. When we   When you delete a management table, Hive will also delete the data in the table. Management is not suitable for table and other tools to share data.

2. Case practical operation

(1) Create a general table

create table if not exists student2(

    id int, name string

)

row format delimited fields terminated by '\t'

stored as textfile

location '/user/hive/warehouse/student2';

(2) create a table based on query results (the results of the query will be added to the table in the newly created)

create table if not exists student3 as select id, name from student;

 

(3) Create a table based on an existing table structure

create table if not exists student4 like student;

Type (4) look-up table

hive > desc formatted student2;

Table Type:             MANAGED_TABLE  

5.2 External Table

1. theory

        Because the table is the outer table, so that its wholly owned Hive is not this data. Delete the table does not delete this data, but the metadata description of the information will be deleted.

2. Management and outer tables of usage scenarios

        The collected daily website logs regularly flow into HDFS text file. On the basis of doing the external table (original log table) on a large number of statistical analysis , an intermediate used in the table, using internal tables stored in result table , data table SELECT + INSERT into the interior.

3. Case practical operation

        Creating departments and employees are outside tables, and import the data in the table.

    (1) the raw data

dept.txt
10	ACCOUNTING	1700
20	RESEARCH	1800
30	SALES	1900
40	OPERATIONS	1700

emp.txt
7369	SMITH	CLERK	7902	1980-12-17	800.00		20
7499	ALLEN	SALESMAN	7698	1981-2-20	1600.00	300.00	30
7521	WARD	SALESMAN	7698	1981-2-22	1250.00	500.00	30
7566	JONES	MANAGER	7839	1981-4-2	2975.00		20
7654	MARTIN	SALESMAN	7698	1981-9-28	1250.00	1400.00	30
7698	BLAKE	MANAGER	7839	1981-5-1	2850.00		30
7782	CLARK	MANAGER	7839	1981-6-9	2450.00		10
7788	SCOTT	ANALYST	7566	1987-4-19	3000.00		20
7839	KING	PRESIDENT		1981-11-17	5000.00		10
7844	TURNER	SALESMAN	7698	1981-9-8	1500.00	0.00	30
7876	ADAMS	CLERK	7788	1987-5-23	1100.00		20
7900	JAMES	CLERK	7698	1981-12-3	950.00		30
7902	FORD	ANALYST	7566	1981-12-3	3000.00		20
7934	MILLER	CLERK	7782	1982-1-23	1300.00		10

(2) construction of the table statement

Creating a Department table

create external table if not exists default.dept(
    deptno int,
    dname string,
    loc int
)
row format delimited fields terminated by '\t';

Create Employee table

create external table if not exists default.emp(
    empno int, 
    ename string,
    job string,
    mgr int,
    hiredate string,
    sal double,
    comm double,
    deptno int
)
row format delimited fields terminated by '\t';

(3) See table created

hive > show tables;

OK

tab_name

dept

emp

(4) into the table data to an external

Import Data

hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept;

hive > load data local inpath '/opt/module/datas/emp.txt' into table default.emp;

search result

hive > select * from emp;

hive > select * from dept;

(5) Display formatted data

hive > desc formatted dept;

Table Type:             EXTERNAL_TABLE

5.3 management table and the outer table interchangeable

Can only use single quotes, strictly case-sensitive, if not in full compliance, it will only take effect without adding kv

Type (1) lookup table

hive > desc formatted student2;

Table Type:             MANAGED_TABLE

(2) modify the internal tables outer table student2

alter table student2 set tblproperties('EXTERNAL'='TRUE');

Type (3) look-up table

hive > desc formatted student2;

Table Type:             EXTERNAL_TABLE

(4) modify the internal tables outer table student2

alter table student2 set tblproperties('EXTERNAL'='FALSE');

Type (5) look-up table

hive > desc formatted student2;

Table Type:             MANAGED_TABLE

          Note: ( 'the EXTERNAL' = ' TRUE ') and ( 'EXTERNAL' = 'FALSE' ) fixed wording, case sensitive!

6, partition table

        Partition table is actually corresponding to a separate file on the file system HDFS folder, the folder is the partition of all data files. Hive partitions is subdirectory , the need for a large data set into smaller data sets according to the service. When a query query selects the specified partition required by WHERE clause expression, such query efficiency will improve a lot.

6.1 partition table basic operations

1. The introduction of the partition table (the need for log management according to date)

/user/hive/warehouse/log_partition/20170702/20170702.log

/user/hive/warehouse/log_partition/20170703/20170703.log

/user/hive/warehouse/log_partition/20170704/20170704.log

2. Create a partition table syntax

hive > create table dept_partition(
    deptno int, dname string, loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';

3. Loading data into a partitioned table

hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');

hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708');

hive > load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201707’);

4. Query data partition table

Single partition query

hive > select * from dept_partition where month='201709';

Multi-partition joint inquiry union (sort) or in three ways

hive > select * from dept_partition where month='201709'
              union
              select * from dept_partition where month='201708'
              union
              select * from dept_partition where month='201707';

_u3.deptno      _u3.dname       _u3.loc _u3.month
10      ACCOUNTING      NEW YORK        201707
10      ACCOUNTING      NEW YORK        201708
10      ACCOUNTING      NEW YORK        201709
20      RESEARCH        DALLAS  201707
20      RESEARCH        DALLAS  201708
20      RESEARCH        DALLAS  201709
30      SALES   CHICAGO 201707
30      SALES   CHICAGO 201708
30      SALES   CHICAGO 201709
40      OPERATIONS      BOSTON  201707
40      OPERATIONS      BOSTON  201708
40      OPERATIONS      BOSTON  201709

5. Add District

Create a single partition

hive > alter table dept_partition add partition(month='201706') ;

Create multiple partitions  separated by spaces

hive > alter table dept_partition add partition(month='201705') partition(month='201704');

6. Delete partition

To delete a single partition

hive > alter table dept_partition drop partition (month='201704');

Delete multiple partitions   separated by commas

hive > alter table dept_partition drop partition (month='201705'), partition (month='201706');

7. View the partition table how many partitions

hive > show partitions dept_partition;

8. View the partition table structure

hive > desc formatted dept_partition;

 

# Partition Information          

# col_name              data_type               comment             

month                   string    

6.2 Partition Table Notes

1. Creating a Secondary Partition Table

create table dept_partition2(
     deptno int, dname string, loc string
)
partitioned by (month string, day string)
row format delimited fields terminated by '\t';

2. Normal loading data

(1) loading data into the secondary partition table

hive > load data local inpath '/opt/module/datas/dept.txt' into table

 default.dept_partition2 partition(month='201709', day='13');

(2) partitioned data queries

hive > select * from dept_partition2 where month='201709' and day='13';

3. The data is directly uploaded to the directory partition, so that the partition table and the associated data generated in three ways

(1) One way: After uploading data recovery

upload data 

hive > dfs -mkdir -p

  /user/hive/warehouse/dept_partition2/month=201709/day=12;

hive > dfs -put /opt/module/datas/dept.txt/user/hive/warehouse

/dept_partition2/month=201709/day=12;

Query data (query data can not just upload)

hive > select * from dept_partition2 where month='201709' and day='12';

Perform a repair order

hive > msck repair table dept_partition2;

Query data again

hive > select * from dept_partition2 where month='201709' and day='12';

(2) Second way: After uploading the data partition is added

upload data

hive > dfs -mkdir -p

 /user/hive/warehouse/dept_partition2/month=201709/day=11;

hive > dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=11;

Add a partition execution

hive > alter table dept_partition2 add partition(month='201709',

 day='11');

Query data

hive > select * from dept_partition2 where month='201709' and day='11';

(3) Three ways: After uploading the data to the partitioned load

Create a directory

hive > dfs -mkdir -p

 /user/hive/warehouse/dept_partition2/month=201709/day=10;

upload data

hive > load data local inpath '/opt/module/datas/dept.txt' into table

 dept_partition2 partition(month='201709',day='10');

Query data

hive > select * from dept_partition2 where month='201709' and day='10';

7 Modify the table

7.1 Rename Table

1. grammar

ALTER TABLE table_name RENAME TO new_table_name

2. Practical operation case

hive > alter table dept_partition2 rename to dept_partition3;

7.2 add, modify and delete the partition table

    See 6.1 above basic operation partition table.

7.3 add / modify / replace column information

1. grammar

Update Column

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

Additions and substitutions column

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

NOTE: ADD is representative of a new field, all fields in the position behind the column (partition before the column), the REPLACE is the substitution table showing all fields.

2. Practical operation case

(1) lookup table structure

hive> desc dept_partition;

(2) add columns

hive > alter table dept_partition add columns(deptdesc string);

(3) look-up table structure

hive> desc dept_partition;

(4) Update Column

hive > alter table dept_partition change column deptdesc desc int;

(5) look-up table structure

hive> desc dept_partition;

(6) Replace Column

hive > alter table dept_partition replace columns(deptno string, dname

 string, loc string);

(7) lookup table structure

hive> desc dept_partition;

8, delete the table

hive > drop table dept_partition;

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_41544550/article/details/92133602