https://www.cnblogs.com/yongjian/p/6640951.html
Hive zoning concept with the traditional relational database partitions different.
Traditional database partitioning: on oracle, the partitions exist independently in the segment, which store the actual data, it is automatically assigned at the time of partition data inserted.
Hive Partitioning: Since Hive is actually stored on the abstract of HDFS, Hive is a partition name corresponds to a directory name, sub-partition name is the subdirectory name, not an actual field.
It can be understood that when we specify the partition when inserting data, in fact, create a new directory or subdirectory, or add a data file in the original directory.
Hive create partitions
Hive is a partition Partitioned by-defined keywords, but be careful when you create the table, Partitioned by clause column is defined in a formal table columns, but the data file does not contain the Hive in these columns, because they is the name of the directory.
Static partition
Creating a static partition table par_tab, a single partition
create table par_tab (name string,nation string) partitioned by (sex string) row format delimited fields terminated by ',';
This time by desc see table structure is as follows
hive> desc par_tab; OK name string nation string sex string # Partition Information # col_name data_type comment sex string Time taken: 0.038 seconds, Fetched: 8 row(s)
Prepare local data files par_tab.txt, the contents of "name / nationality", will be gender (sex) as the partition
jan,china mary,america lilei,china heyong,china yiku, japan emoji,japan
Inserting data into the table (in fact, the load operation is equivalent to moving the file to the HDFS Hive directory)
load data local inpath '/home/hadoop/files/par_tab.txt' into table par_tab partition (sex='man');
This time the query in the hive par_tab table, turned into three, pay attention.
hive> select * from par_tab; OK jan china man mary america man lilei china man heyong china man yiku japan man Emoji Japanese man Time taken: 0.076 seconds, Fetched: 6 row(s)
View par_tab directory structure
[hadoop@hadoop001 files]$ hadoop dfs -lsr /user/hive/warehouse/par_tab drwxr-xr-x - hadoop supergroup 0 2017-03-29 08:25 /user/hive/warehouse/par_tab/sex=man -rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08:25 /user/hive/warehouse/par_tab/sex=man/par_tab.txt
It can be seen in the new partition table, the system will default path / user / hive / warehouse / data warehouse under the hive to create a directory (table name), and then create a subdirectory of sex = man (partition name), and finally in the name of the partition to store the actual data files.
If you insert another data file data, such as file
lily,china nancy,china hanmeimei, america
Insert data
load data local inpath '/home/hadoop/files/par_tab_wm.txt' into table par_tab partition (sex='woman');
View par_tab table directory structure
[hadoop@hadoop001 files]$ hadoop dfs -lsr /user/hive/warehouse/par_tab drwxr-xr-x - hadoop supergroup 0 2017-03-29 08:25 /user/hive/warehouse/par_tab/sex=man -rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08:25 /user/hive/warehouse/par_tab/sex=man/par_tab.txt drwxr-xr-x - hadoop supergroup 0 2017-03-29 08:35 /user/hive/warehouse/par_tab/sex=woman -rwxr-xr-x 1 hadoop supergroup 41 2017-03-29 08:35 /user/hive/warehouse/par_tab/sex=woman/par_tab_wm.txt
View last two result of the insertion, including the man and woman
hive> select * from par_tab; OK jan china man mary america man lilei china man heyong china man yiku japan man Emoji Japanese man lily china woman nancy china woman hanmeimei america woman Time taken: 0.136 seconds, Fetched: 9 row(s)
Because the partition table column is the actual definition of the column, the query partition data
hive> select * from par_tab where sex='woman'; OK lily china woman nancy china woman hanmeimei america woman Time taken: 0.515 seconds, Fetched: 3 row(s)
Creating a static partition table below par_tab_muilt, multiple partitions (sex + date)
hive> create table par_tab_muilt (name string, nation string) partitioned by (sex string,dt string) row format delimited fields terminated by ',' ; hive> load data local inpath '/home/hadoop/files/par_tab.txt' into table par_tab_muilt partition (sex='man',dt='2017-03-29'); [hadoop@hadoop001 files]$ hadoop dfs -lsr /user/hive/warehouse/par_tab_muilt drwxr-xr-x - hadoop supergroup 0 2017-03-29 08:45 /user/hive/warehouse/par_tab_muilt/sex=man drwxr-xr-x - hadoop supergroup 0 2017-03-29 08:45 /user/hive/warehouse/par_tab_muilt/sex=man/dt=2017-03-29 -rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 08:45 /user/hive/warehouse/par_tab_muilt/sex=man/dt=2017-03-29/par_tab.txt
Visible, when the order of the partitions defined in the new table, determines the order of the file directory (the directory who is the father who is a subdirectory), precisely because of this hierarchy, when we query all man when the man at all dates below data will be checked out. If only query date partition, but the parent directory sex = man and sex = woman have data for that date, then the Hive will enter the path trimmed so that only the partition scan date, gender partition without filter (ie, the query results include all genders ).
Dynamic Partitioning
If the above static partition, you must first insert the time to know what partition type, and each partition to write a load data, too annoying. Use dynamic partitioning solve the above problems, it can be dynamically assigned to the partition based on data from the query. In fact, the dynamic and static partition Partition difference is that you do not specify a directory partition, chosen by the system itself.
First, start dynamic partitioning feature
hive> set hive.exec.dynamic.partition=true;
Suppose a table has been par_tab, the former two is the name of name and nationality nation, after the two columns is partitioned, gender, sex and date dt, the following data
hive> select * from par_tab; OK lily china man 2013-03-28 nancy china man 2013-03-28 hanmeimei america man 2013-03-28 jan china man 2013-03-29 mary america man 2013-03-29 lilei china man 2013-03-29 heyong china man 2013-03-29 yiku japan man 2013-03-29 Emoji Japanese man 2013-03-29 Time taken: 1.141 seconds, Fetched: 9 row(s)
Now I put the contents of this table are inserted directly into another table par_dnm in, and realize sex is a static partition, dt dynamic partitioning (which is not specified in the end day, allow the system to their own allocation decisions)
hive> insert overwrite table par_dnm partition(sex='man',dt) > select name, nation, dt from par_tab;
After inserting the directory structure look
drwxr-xr-x - hadoop supergroup 0 2017-03-29 10:32 /user/hive/warehouse/par_dnm/sex=man drwxr-xr-x - hadoop supergroup 0 2017-03-29 10:32 /user/hive/warehouse/par_dnm/sex=man/dt=2013-03-28 -rwxr-xr-x 1 hadoop supergroup 41 2017-03-29 10:32 /user/hive/warehouse/par_dnm/sex=man/dt=2013-03-28/000000_0 drwxr-xr-x - hadoop supergroup 0 2017-03-29 10:32 /user/hive/warehouse/par_dnm/sex=man/dt=2013-03-29 -rwxr-xr-x 1 hadoop supergroup 71 2017-03-29 10:32 /user/hive/warehouse/par_dnm/sex=man/dt=2013-03-29/000000_0
View the number of partitions
hive> show par_dnm scores; OK sex=man/dt=2013-03-28 sex=man/dt=2013-03-29 Time taken: 0.065 seconds, Fetched: 2 row(s)
Dynamic partitioning proved successful.
Note that dynamic partitioning does not allow the use of primary partition and sub-partition static column dynamic column, so will cause all of the primary partitions to be created, vice partitions that are defined in the static column.
Dynamic partitioning allows all partitioning columns are dynamic partitioning column, but must first set a parameter hive.exec.dynamic.partition.mode:
hive> set hive.exec.dynamic.partition.mode; hive.exec.dynamic.partition.mode=strict
Its default value is strick, which does not allow all the partitioning column is dynamic, it is possible to prevent the user intent is only to build dynamic partitioning in sub-partition, but inadvertently forgot to specify the value of the primary partition column, which will resulting in a large number of dml statement creates a new partition (corresponding to the number of new folders) in a short time, impact on system performance.
So we want to set:
hive> set hive.exec.dynamic.partition.mode=nostrick;
Any errors, please notify correct.