hive four other table, partition table, internal, external table, table barrel

Hive four table type internal table, external table, the partition table and the tub Table

I. Overview

Overall Hive There are four tables: External tables, the internal table (management table), the partition table, barrel tables. Corresponding to different requirements. Here mainly on the various exclusions tables, create and load data method.

Second, the specific content

1. Internal Table

Create internal tables and load data

  1.  
    create table emp_inner(
  2.  
    empno int,
  3.  
    ename string,
  4.  
    job string,
  5.  
    mgr int,
  6.  
    hiredate string,
  7.  
    sal double,
  8.  
    comm double,
  9.  
    deptno int
  10.  
    )
  11.  
    row format delimited fields terminated by '\t'
  12.  
    LOCATION '/user/hive/warehouse/hadoop.db/emp';

2. External Table

(1) What it is:

  When a log analysis requires multiple teams together, table created after the analysis done can be deleted. But ordinary table deletion will also delete the data, which would affect the analysis of other teams, and log data can not be easily removed. Therefore, the need for an external table, delete the external table, it does not delete the data on the corresponding hdfs.

(2) create an external table

  1.  
    create EXTERNAL table dept_ext(
  2.  
    deptno int,
  3.  
    dname string,
  4.  
    loc string
  5.  
    )
  6.  
    row format delimited fields terminated by '\t' ;
  7.  
    load data local inpath '/opt/datas/dept.txt' into table dept_ext;

(3) comparing the difference between the outer table and inner table
    to delete the external table, the data will not change, just the metadata mysql is modified, but deletes the internal table (management table), the data will be deleted.

    Summary: The difference between the internal and outer tables hive of
        1:00) to create the table: When you create an internal table, will move to the path of the data warehouse data points; if you create an external table, only records where the data path, the data do not in any position change.
        2) When you delete a table: In the deleted table, the metadata for internal tables and data will be deleted together, and external table only remove metadata, do not delete the data. Such external table relatively more safer, more flexible organization of data to facilitate the sharing of source data

3, a temporary table

(1) What it is

Interim analysis, after closing the hive client, the temporary table will disappear. Primarily for storing intermediate results set is not important, unimportant table.

(2) create a temporary table and load the data

  1.  
    create TEMPORARY table dept_tmp(
  2.  
    deptno int,
  3.  
    dname string,
  4.  
    loc string
  5.  
    )
  6.  
    row format delimited fields terminated by '\t';
  7.  
     
  8.  
    load data local inpath '/opt/datas/dept.txt' into table dept_tmp;

(3) View location information

  1.  
    desc formatted dept_tmp;
  2.  
    Location: hdfs://172.19.199.187:8020/tmp/hive/hadoop/68174383-f427-4629-9707-0ab1c9b07726/_tmp_space.db/d872efec-1294-48b0-9071-31cf98d46400
  3.  
    Table Type: MANAGED_TABLE

4, partition table [***]

(1) What it is

Common table: select * from logs where date = '20171209', the process performs: a full table of data query, before the filtering operation.

Partition table: select * from logs where date = '20171209', the process performs: a data path in the corresponding file loaded directly. Suitable for large quantities of data, you can quickly locate the data to be queried by a partition, partition table main role is to improve the query efficiency.

(2) create a partition and load data

  1.  
    create table emp_part(
  2.  
    empno int,
  3.  
    ename string,
  4.  
    job string,
  5.  
    mgr int,
  6.  
    hiredate string,
  7.  
    sal double,
  8.  
    comm double,
  9.  
    deptno int
  10.  
    )partitioned by (`datetime` string)
  11.  
    row format delimited fields terminated by '\t';
  12.  
     
  13.  
    load data local inpath '/opt/datas/emp.txt' into table emp_part partition(`datetime`='20171209');
  14.  
    load data local inpath '/opt/datas/emp.txt' into table emp_part partition(`datetime`='20171208');
  15.  
    [Folders are formed on the two hdfs, emp.txt stored inside]
  16.  
    /user/hive/warehouse/hadoop.db/emp_part/datetime=20171208
  17.  
    /user/hive/warehouse/hadoop.db/emp_part/datetime=20171209
  18.  
     
  19.  
    search result:
  20.  
    select * from emp_part where `datetime` = '20171209';

(3) Partition and create two load data

  1.  
    create table emp_part2(
  2.  
    empno int,
  3.  
    ename string,
  4.  
    job string,
  5.  
    mgr int,
  6.  
    hiredate string,
  7.  
    sal double,
  8.  
    comm double,
  9.  
    deptno int
  10.  
    )partitioned by (`datetime` string,hour string)
  11.  
    row format delimited fields terminated by '\t';
  12.  
     
  13.  
    load data local inpath '/opt/datas/emp.txt' into table emp_part2 partition(`datetime`='20171209',hour='01');
  14.  
     
  15.  
    load data local inpath '/opt/datas/emp.txt' into table emp_part2 partition(`datetime`='20171209',hour='02');
  16.  
    [On hdfs, forming a directory]
  17.  
    /user/hive/warehouse/hadoop.db/emp_part2/datetime=20171209/hour=01
  18.  
    /user/hive/warehouse/hadoop.db/emp_part2/datetime=20171209/hour=02

search result:

  1.  
    -》 select * from emp_part2 where `datetime` = '20171209';
  2.  
    View all of the data (ie, twice the emp.txt data) in /user/hive/warehouse/hadoop.db/emp_part2/datetime=20171209
  3.  
    -》 select * from emp_part2 where `datetime` = '20171209' and hour = '01';
  4.  
    View all of the data (ie emp.txt data) in /user/hive/warehouse/hadoop.db/emp_part2/datetime=20171209/hour=01

(4) create an external partition table (deleted when deleting just the metadata, the data will not be deleted)

  1.  
    create EXTERNAL table emp_test(
  2.  
    empno int,
  3.  
    ename string,
  4.  
    job string,
  5.  
    mgr int,
  6.  
    hiredate string,
  7.  
    sal double,
  8.  
    comm double,
  9.  
    deptno int
  10.  
    )
  11.  
    PARTITIONED BY(date string,hour string)
  12.  
    row format delimited fields terminated by '\t';

(5) The method of loading the partition table data

(A) by the load command (specify the partition) directly load data into a partitioned table, select can be queried.                        

load data local inpath '/opt/datas/emp.txt' into table emp_part2 partition(`datetime`='20171209',hour='01');

(B) to manually create the directory /user/hive/warehouse/hadoop.db/emp_part2/datetime=20171209/hour=03, then put the data table select query is a query visible. Then, alter the path to the original database mysql database.

alter table emp_part2 add partition(`datetime`='20171209',hour='03');

5, barrel table

(1) use case

Data have serious data skew, uneven distribution, but relatively speaking, the amount of data in each bucket would be more evenly. Pails join other inquiries between the bucket when there will be optimized.

(2) create and use

First of all,

 set hive.enforce.bucketing = true; 

then,

  1.  
    create table emp_bu(
  2.  
    empno int,
  3.  
    ename string,
  4.  
    job string,
  5.  
    mgr int,
  6.  
    hiredate string,
  7.  
    sal double,
  8.  
    comm double,
  9.  
    deptno int
  10.  
    )CLUSTERED BY(deptno) INTO 4 BUCKETS
  11.  
    row format delimited fields terminated by '\t';

Finally, load data, use the insert

insert overwrite table emp_bu_2 select * from emp;
也可以指定分区写入 insert overwrite

Guess you like

Origin www.cnblogs.com/Mr--zhao/p/11454582.html