Hive_ partition table

Partition table is actually corresponding to a separate file on the file system HDFS folder, the folder is the partition of all data files. Hive partitions is divided directory, you need a large data set into smaller data sets according to the service. When a query query selects the specified partition required by WHERE clause expression, such query efficiency will improve a lot.

Partition table basic operations

1. The introduction of the partition table (the need for log management according to date)

/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log

2. Create a partition table syntax

hive (default)> create table dept (
deptno int, dname string, loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';

Note: partition field is not already present in the table data, you may be considered as dummy partition field in the list.

3. Loading data into a partitioned table

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept partition(month='201709');
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept partition(month='201708');
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept partition(month='201707’);

Note: When the partition table data is loaded, you must specify partition

 

 

 

 

4. Query data partition table

Single partition query

hive (default)> select * from dept_partition where month='201709';

Multi-partition joint inquiry

hive (default)> select * from dept_partition where month='201709'
              union
              select * from dept_partition where month='201708'
              union
              select * from dept_partition where month='201707';

_u3.deptno      _u3.dname       _u3.loc _u3.month
10      ACCOUNTING      NEW YORK        201707
10      ACCOUNTING      NEW YORK        201708
10      ACCOUNTING      NEW YORK        201709
20      RESEARCH        DALLAS  201707
20      RESEARCH        DALLAS  201708
20      RESEARCH        DALLAS  201709
30      SALES   CHICAGO 201707
30      SALES   CHICAGO 201708
30      SALES   CHICAGO 201709
40      OPERATIONS      BOSTON  201707
40      OPERATIONS      BOSTON  201708
40      OPERATIONS      BOSTON  201709

5. Add District

Create a single partition

hive (default)> alter table dept_partition add partition(month='201706') ;

Create multiple partitions

hive (default)> alter table dept_partition add partition(month='201705') partition(month='201704');

6. Delete partition

To delete a single partition

hive (default)> alter table dept_partition drop partition (month='201704');

Delete multiple partitions

hive (default)> alter table dept_partition drop partition (month='201705'), partition (month='201706');

7. View the partition table how many partitions

hive> show partitions dept_partition;

8. View the partition table structure

hive> desc formatted dept_partition;

# Partition Information          
# col_name              data_type               comment             
month                   string    

Partition Table Notes

1. Creating a Secondary Partition Table

hive (default)> create table dept_partition2(
               deptno int, dname string, loc string
               )
               partitioned by (month string, day string)
               row format delimited fields terminated by '\t';

2. Normal loading data

(1) loading data into the secondary partition table

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table
 default.dept_partition2 partition(month='201709', day='13');

(2) partitioned data queries

hive (default)> select * from dept_partition2 where month='201709' and day='13';

3. The data is directly uploaded to the directory partition, so that the partition table and the associated data generated in three ways

(1) One way: After uploading data recovery

       upload data

hive (default)> dfs -mkdir -p
 /user/hive/warehouse/dept_partition2/month=201709/day=12;
hive (default)> dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=12;

       Query data (query data can not just upload)

hive (default)> select * from dept_partition2 where month='201709' and day='12';

  Perform a repair order

hive> msck repair table dept_partition2;

  Query data again

hive (default)> select * from dept_partition2 where month='201709' and day='12';

(2) Second way: After uploading the data partition is added

       upload data

hive (default)> dfs -mkdir -p
 /user/hive/warehouse/dept_partition2/month=201709/day=11;
hive (default)> dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=11;

       Add a partition execution

hive (default)> alter table dept_partition2 add partition(month='201709',day='11');

       Query data

hive (default)> select * from dept_partition2 where month='201709' and day='11';

(3) Three ways: After you create a folder to load data partition

       Create a directory

hive (default)> dfs -mkdir -p
 /user/hive/warehouse/dept_partition2/month=201709/day=10;

  upload data

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table
 dept_partition2 partition(month='201709',day='10');

  Query data

hive (default)> select * from dept_partition2 where month='201709' and day='10';

 

Guess you like

Origin www.cnblogs.com/Tunan-Ki/p/11795782.html