Within the hive to establish an external table associated with HDFS file

Bloggers refer to this article: https://www.cnblogs.com/fefjay/p/6044474.html

, On the insert into the Hive and insert overwrite the data partition

1 "data partition: The main purpose of the database partitions in order to reduce the amount of data is read in a particular SQL operation to reduce the response time, mainly includes two partition forms: horizontal partition and a vertical partition. Horizontal partition table is row-partitioned. And the vertical partition is to partition the column, typically by a vertical division of the table to reduce the width of the target table, a horizontal partition is used.
2 "create the partition syntax:

create external table if not exists tablename(

        a string,

        b string)

 partitioned by (year string,month string)

 row format delimited fields terminated by ',';

3 "hive contains three ways to partition table data is inserted into fields:

1、静态插入数据:要求插入数据时指定与建表时相同的分区字段,如:
insert overwrite tablename (year='2019', month='06') select a, b from tablename2;
2、动静混合分区插入:要求指定部分分区字段的值,如:
insert overwrite tablename (year='2019', month) select a, b from tablename2;
3、动态分区插入:只指定分区字段,不用指定值,如:
insert overwrite tablename (year, month) select a, b from tablename2;

4 "hive dynamic partition settings related parameters:

Hive.exec.dynamic.partition  是否启动动态分区。false(不开启) true(开启)默认是 false

hive.exec.dynamic.partition.mode  打开动态分区后,动态分区的模式,有 strict和 nonstrict 两个值可选,strict 要求至少包含一个静态分区列,nonstrict则无此要求。各自的好处,大家自己查看哈。

hive.exec.max.dynamic.partitions 允许的最大的动态分区的个数。可以手动增加分区。默认1000

hive.exec.max.dynamic.partitions.pernode 一个 mapreduce job所允许的最大的动态分区的个数。默认是100

5 "into the data and insert into Overwrite INSERT
1, the definition: hive is a Hadoop-based data warehousing tools, you can map the structure of the data file to a database table, and provides a simple sql query function, you can sql statement converted to run MapReduce tasks. Hive generally comprises the following four data import mode:

(1)、从本地文件系统中导入数据到Hive表;

(2)、从HDFS上导入数据到Hive表;

(3)、在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中;

(4)、从别的表中查询出相应的数据并导入到Hive表中。

INSERT INTO

 样例:
 insert into table tablename1 select a, b, c from tablename2;

INSERT OVERWRITE

样例:
insert overwrite table tablename1 select a, b, c from tablename2;

2, the difference between: insert into the insert overwrite can be inserted into the hive data in the table, but added directly to insert into the tail of the data in the table, insert overwrite overwrites data, both to delete, and then write. If the presence of the partition, insert overwrite data will only rewrite the current partition.

Published 118 original articles · won praise 26 · views 60000 +

Guess you like

Origin blog.csdn.net/qq_43147136/article/details/91866622