Hive basic exercises two

The following is a hive basic exercises for supplement.

Hive export data in several ways, how to export data

1.insert

# 分为导出到本地或者hdfs,还可以格式化输出,指定分隔符
# 导出到本地
0: jdbc:hive2://node01:10000> insert overwrite local directory '/kkb/install/hivedatas/stu3' select * from stu;
INFO  : Compiling command(queryId=hadoop_20191116221919_74a3d6f7-5995-4a1e-b072-e30d6269d394): insert overwrite local directory '/kkb/install/hivedatas/stu3' select * from stu
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stu.id, type:int, comment:null), FieldSchema(name:stu.name, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191116221919_74a3d6f7-5995-4a1e-b072-e30d6269d394); Time taken: 0.107 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191116221919_74a3d6f7-5995-4a1e-b072-e30d6269d394): insert overwrite local directory '/kkb/install/hivedatas/stu3' select * from stu
INFO  : Query ID = hadoop_20191116221919_74a3d6f7-5995-4a1e-b072-e30d6269d394
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : Starting Job = job_1573910690864_0002, Tracking URL = http://node01:8088/proxy/application_1573910690864_0002/
INFO  : Kill Command = /kkb/install/hadoop-2.6.0-cdh5.14.2//bin/hadoop job  -kill job_1573910690864_0002
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2019-11-16 22:19:40,957 Stage-1 map = 0%,  reduce = 0%
INFO  : 2019-11-16 22:19:42,002 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.51 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 510 msec
INFO  : Ended Job = job_1573910690864_0002
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Copying data to local directory /kkb/install/hivedatas/stu3 from hdfs://node01:8020/tmp/hive/anonymous/2d04ba8e-9799-4a31-a93d-557db4086e81/hive_2019-11-16_22-19-32_776_5008666227900564137-1/-mr-10000
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 1.51 sec   HDFS Read: 3381 HDFS Write: 285797 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 1 seconds 510 msec
INFO  : Completed executing command(queryId=hadoop_20191116221919_74a3d6f7-5995-4a1e-b072-e30d6269d394); Time taken: 10.251 seconds
INFO  : OK
No rows affected (10.383 seconds)
# 查看本地文件
[hadoop@node01 /kkb/install/hivedatas/stu3]$ cat 000000_0
1clyang

# 导出到hdfs
0: jdbc:hive2://node01:10000> insert overwrite directory '/kkb/stu' select * from stu;
INFO  : Compiling command(queryId=hadoop_20191116222424_7b753364-9268-42e7-89fb-056424bc6852): insert overwrite directory '/kkb/stu' select * from stu
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stu.id, type:int, comment:null), FieldSchema(name:stu.name, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191116222424_7b753364-9268-42e7-89fb-056424bc6852); Time taken: 0.173 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191116222424_7b753364-9268-42e7-89fb-056424bc6852): insert overwrite directory '/kkb/stu' select * from stu
INFO  : Query ID = hadoop_20191116222424_7b753364-9268-42e7-89fb-056424bc6852
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : Starting Job = job_1573910690864_0003, Tracking URL = http://node01:8088/proxy/application_1573910690864_0003/
INFO  : Kill Command = /kkb/install/hadoop-2.6.0-cdh5.14.2//bin/hadoop job  -kill job_1573910690864_0003
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2019-11-16 22:24:13,962 Stage-1 map = 0%,  reduce = 0%
INFO  : 2019-11-16 22:24:15,018 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 460 msec
INFO  : Ended Job = job_1573910690864_0003
INFO  : Starting task [Stage-6:CONDITIONAL] in serial mode
INFO  : Stage-3 is selected by condition resolver.
INFO  : Stage-2 is filtered out by condition resolver.
INFO  : Stage-4 is filtered out by condition resolver.
INFO  : Starting task [Stage-3:MOVE] in serial mode
INFO  : Moving data to: hdfs://node01:8020/kkb/stu/.hive-staging_hive_2019-11-16_22-24-06_937_5666063681275061436-1/-ext-10000 from hdfs://node01:8020/kkb/stu/.hive-staging_hive_2019-11-16_22-24-06_937_5666063681275061436-1/-ext-10002
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Moving data to: /kkb/stu from hdfs://node01:8020/kkb/stu/.hive-staging_hive_2019-11-16_22-24-06_937_5666063681275061436-1/-ext-10000
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 1.46 sec   HDFS Read: 3315 HDFS Write: 286719 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 1 seconds 460 msec
INFO  : Completed executing command(queryId=hadoop_20191116222424_7b753364-9268-42e7-89fb-056424bc6852); Time taken: 9.044 seconds
INFO  : OK
# 查看hdfs
[hadoop@node01 /kkb/install/hivedatas/stu3]$ hdfs dfs -cat /kkb/stu/000000_0
19/11/16 22:26:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1clyang

# 可以指定导出本地格式化分隔符,以导出到本地为例
0: jdbc:hive2://node01:10000> insert overwrite local directory '/kkb/install/hivedatas/stu4' row format delimited fields terminated by '@' select * from stu;
INFO  : Compiling command(queryId=hadoop_20191116223131_ebe796bf-7dcd-4a30-bcba-c63b7366773f): insert overwrite local directory '/kkb/install/hivedatas/stu4' row format delimited fields terminated by '@' select * from stu
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stu.id, type:int, comment:null), FieldSchema(name:stu.name, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191116223131_ebe796bf-7dcd-4a30-bcba-c63b7366773f); Time taken: 0.128 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191116223131_ebe796bf-7dcd-4a30-bcba-c63b7366773f): insert overwrite local directory '/kkb/install/hivedatas/stu4' row format delimited fields terminated by '@' select * from stu
INFO  : Query ID = hadoop_20191116223131_ebe796bf-7dcd-4a30-bcba-c63b7366773f
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : Starting Job = job_1573910690864_0005, Tracking URL = http://node01:8088/proxy/application_1573910690864_0005/
INFO  : Kill Command = /kkb/install/hadoop-2.6.0-cdh5.14.2//bin/hadoop job  -kill job_1573910690864_0005
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2019-11-16 22:31:27,083 Stage-1 map = 0%,  reduce = 0%
INFO  : 2019-11-16 22:31:28,139 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.93 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 930 msec
INFO  : Ended Job = job_1573910690864_0005
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Copying data to local directory /kkb/install/hivedatas/stu4 from hdfs://node01:8020/tmp/hive/anonymous/2d04ba8e-9799-4a31-a93d-557db4086e81/hive_2019-11-16_22-31-20_415_1737902713220629568-1/-mr-10000
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 1.93 sec   HDFS Read: 3526 HDFS Write: 286073 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 1 seconds 930 msec
INFO  : Completed executing command(queryId=hadoop_20191116223131_ebe796bf-7dcd-4a30-bcba-c63b7366773f); Time taken: 8.707 seconds
INFO  : OK
# 查看本地文件,发现以@分隔
[hadoop@node01 /kkb/install/hivedatas/stu4]$ cat 000000_0
1@clyang

2.hadoop command

After the data is stored using a hive exists in hdfs, you can pull data directly from hdfs local, use the get command.

hdfs dfs -get /user/hive/warehouse/student/student.txt /opt/bigdata/data

3.bash shell covering additional export

Use bin / hive -e sql statement or bin / hive -f sql script, or appending data covering export, where the former, for example, another mainly sql statement sql script on nature.

# 覆盖写
[hadoop@node01 /kkb/install/hive-1.1.0-cdh5.14.2/bin]$ ./hive -e 'select * from db_hive.stu' > /kkb/install/hivedatas/student2.txt
ls: cannot access /kkb/install/spark/lib/spark-assembly-*.jar: No such file or directory
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/kkb/install/hbase-1.2.0-cdh5.14.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/kkb/install/hadoop-2.6.0-cdh5.14.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-11-16 22:37:46,342 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/16 22:37:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Logging initialized using configuration in file:/kkb/install/hive-1.1.0-cdh5.14.2/conf/hive-log4j.properties
OK
Time taken: 6.966 seconds, Fetched: 1 row(s)
You have new mail in /var/spool/mail/root
# 查看结果
[hadoop@node01 /kkb/install/hivedatas]$ cat student2.txt
stu.id  stu.name
1   clyang
# 追加写
[hadoop@node01 /kkb/install/hive-1.1.0-cdh5.14.2/bin]$ ./hive -e 'select * from db_hive.stu' >> /kkb/install/hivedatas/student2.txt
ls: cannot access /kkb/install/spark/lib/spark-assembly-*.jar: No such file or directory
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/kkb/install/hbase-1.2.0-cdh5.14.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/kkb/install/hadoop-2.6.0-cdh5.14.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-11-16 22:39:03,442 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/16 22:39:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Logging initialized using configuration in file:/kkb/install/hive-1.1.0-cdh5.14.2/conf/hive-log4j.properties
OK
Time taken: 6.056 seconds, Fetched: 1 row(s)
You have new mail in /var/spool/mail/root
# 查看追加写后结果
[hadoop@node01 /kkb/install/hivedatas]$ cat student2.txt
stu.id  stu.name
1   clyang
stu.id  stu.name
1   clyang

4.export export to hdfs

# 导出
0: jdbc:hive2://node01:10000> export table stu to '/kkb/studentexport';
INFO  : Compiling command(queryId=hadoop_20191105094343_87d41d16-e4cd-43ac-9593-86e799d23a6a): export table stu to '/kkb/studentexport'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191105094343_87d41d16-e4cd-43ac-9593-86e799d23a6a); Time taken: 0.126 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191105094343_87d41d16-e4cd-43ac-9593-86e799d23a6a): export table stu to '/kkb/studentexport'
INFO  : Starting task [Stage-0:COPY] in serial mode
INFO  : Copying data from file:/tmp/hadoop/e951940a-bcb6-4cd4-be17-0baf5d13615f/hive_2019-11-05_09-43-30_802_7299251851779747447-1/-local-10000/_metadata to hdfs://node01:8020/kkb/studentexport
INFO  : Copying file: file:/tmp/hadoop/e951940a-bcb6-4cd4-be17-0baf5d13615f/hive_2019-11-05_09-43-30_802_7299251851779747447-1/-local-10000/_metadata
INFO  : Starting task [Stage-1:COPY] in serial mode
INFO  : Copying data from hdfs://node01:8020/user/hive/warehouse/db_hive.db/stu to hdfs://node01:8020/kkb/studentexport/data
INFO  : Copying file: hdfs://node01:8020/user/hive/warehouse/db_hive.db/stu/000000_0
INFO  : Completed executing command(queryId=hadoop_20191105094343_87d41d16-e4cd-43ac-9593-86e799d23a6a); Time taken: 0.604 seconds
INFO  : OK

# 查看数据
[hadoop@node01 /kkb/install/hivedatas]$ hdfs dfs -ls /kkb/studentexport
19/11/17 20:29:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rwxr-xr-x   3 anonymous supergroup       1330 2019-11-05 09:43 /kkb/studentexport/_metadata
drwxr-xr-x   - anonymous supergroup          0 2019-11-05 09:43 /kkb/studentexport/data
[hadoop@node01 /kkb/install/hivedatas]$ hdfs dfs -ls /kkb/studentexport/data
19/11/17 20:29:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxr-xr-x   3 anonymous supergroup          9 2019-11-05 09:43 /kkb/studentexport/data/000000_0
You have new mail in /var/spool/mail/root
[hadoop@node01 /kkb/install/hivedatas]$ hdfs dfs -cat /kkb/studentexport/data/000000_0
19/11/17 20:29:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1clyang

The difference between partition and sub-barrel

Partition is visible in the folder, the folder is in accordance with distinction, to store the file, the file is divided tub areas, according to a file according to a modulo hash field, split into several files saved fragments, each with its own scenario:

(1) In accordance with the partitioning date, by day, or hour to save the data, back to the queries can quickly locate the data according to the needs, to avoid the slow full table scan query.

(2) points are more fine-grained barrels of storage, you can specify the number n of the barrel, such a document will be divided into n parts save, if you want to quickly find can tablesample (bucket x out of y) to specify the sampling barrel-table queries.

In addition it may have a partition table inside bucket list.

The data is uploaded directly to the partition table of contents (hdfs), the partition table and allow data to associate what way?

When you create the partition table and import data to a partition, it found that the imported data is saved in the corresponding partition directory, and can be a normal query table of contents. If the first data into a prepared partition, and then create the partition table, the data is finding out, because there is no mapping relationship between the partition table and the hive data table, you need to use the command to repair, in addition to two kinds method.

Method 1 msck repair table table

Ready ahead of partition and data upload.

[hadoop@node01 /kkb/install/hivedatas]$ hdfs dfs -ls /mystudentdatas/month=11/
19/11/17 12:36:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 hadoop supergroup        199 2019-11-17 12:36 /mystudentdatas/month=11/student.csv

Create Table

0: jdbc:hive2://node01:10000> create table student_partition_me(id string,name string,year string,gender string) partitioned by(month string) row format delimited fields terminated by '\t' location '/mystudentdatas';
INFO  : Compiling command(queryId=hadoop_20191117123838_5b1f3eaf-f2f2-4b2e-b87f-2fdd8415f9d4): create table student_partition_me(id string,name string,year string,gender string) partitioned by(month string) row format delimited fields terminated by '\t' location '/mystudentdatas'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117123838_5b1f3eaf-f2f2-4b2e-b87f-2fdd8415f9d4); Time taken: 0.149 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117123838_5b1f3eaf-f2f2-4b2e-b87f-2fdd8415f9d4): create table student_partition_me(id string,name string,year string,gender string) partitioned by(month string) row format delimited fields terminated by '\t' location '/mystudentdatas'
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hadoop_20191117123838_5b1f3eaf-f2f2-4b2e-b87f-2fdd8415f9d4); Time taken: 0.271 seconds
INFO  : OK

Repair table, use msck, after the repair can view the data in the table, the mapping relationship is established.

# 修复表格
0: jdbc:hive2://node01:10000> msck repair table student_partition_me;
INFO  : Compiling command(queryId=hadoop_20191117124141_f09531b3-29fd-48a4-95c7-bec7018cf631): msck repair table student_partition_me
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117124141_f09531b3-29fd-48a4-95c7-bec7018cf631); Time taken: 0.011 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117124141_f09531b3-29fd-48a4-95c7-bec7018cf631): msck repair table student_partition_me
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hadoop_20191117124141_f09531b3-29fd-48a4-95c7-bec7018cf631); Time taken: 0.263 seconds
INFO  : OK
No rows affected (0.311 seconds)
# 查询,最后字段为分区字段month
0: jdbc:hive2://node01:10000> select id,name,year,gender,month from student_partition_me;
INFO  : Compiling command(queryId=hadoop_20191117161313_257c8b24-4e53-4690-b343-b5f532c43e1a): select id,name,year,gender,month from student_partition_me
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:year, type:string, comment:null), FieldSchema(name:gender, type:string, comment:null), FieldSchema(name:month, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117161313_257c8b24-4e53-4690-b343-b5f532c43e1a); Time taken: 0.133 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117161313_257c8b24-4e53-4690-b343-b5f532c43e1a): select id,name,year,gender,month from student_partition_me
INFO  : Completed executing command(queryId=hadoop_20191117161313_257c8b24-4e53-4690-b343-b5f532c43e1a); Time taken: 0.0 seconds
INFO  : OK
+-----+-------+-------------+---------+--------+--+
| id  | name  |    year     | gender  | month  |
+-----+-------+-------------+---------+--------+--+
| 01  | 赵雷    | 1990-01-01  | 男       | 11     |
| 02  | 钱电    | 1990-12-21  | 男       | 11     |
| 03  | 孙风    | 1990-05-20  | 男       | 11     |
| 04  | 李云    | 1990-08-06  | 男       | 11     |
| 05  | 周梅    | 1991-12-01  | 女       | 11     |
| 06  | 吴兰    | 1992-03-01  | 女       | 11     |
| 07  | 郑竹    | 1989-07-01  | 女       | 11     |
| 08  | 王菊    | 1990-01-20  | 女       | 11     |
+-----+-------+-------------+---------+--------+--+
8 rows selected (0.214 seconds)

Method 2 alter table table add partition (col = xxx)

Upload data to hdfs

# 注意这里hdfs数据目录换成studentdatas
[hadoop@node01 /kkb/install/hivedatas]$ hdfs dfs -ls /studentdatas/month=12/
19/11/17 16:51:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 hadoop supergroup        199 2019-11-17 16:51 /studentdatas/month=12/student.csv

Create Table

0: jdbc:hive2://node01:10000> create table student_partition_pa(id string,name string,year string,gender string) partitioned by(month string) row format delimited fields terminated by '\t' location '/studentdatas';
INFO  : Compiling command(queryId=hadoop_20191117164141_666aa048-6fec-43fc-9bb1-4ea1ebd51699): create table student_partition_pa(id string,name string,year string,gender string) partitioned by(month string) row format delimited fields terminated by '\t' location '/studentdatas'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117164141_666aa048-6fec-43fc-9bb1-4ea1ebd51699); Time taken: 0.011 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117164141_666aa048-6fec-43fc-9bb1-4ea1ebd51699): create table student_partition_pa(id string,name string,year string,gender string) partitioned by(month string) row format delimited fields terminated by '\t' location '/studentdatas'
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hadoop_20191117164141_666aa048-6fec-43fc-9bb1-4ea1ebd51699); Time taken: 0.097 seconds
INFO  : OK

Use alter table specified partition

0: jdbc:hive2://node01:10000> alter table student_partition_pa add partition(month='12');
INFO  : Compiling command(queryId=hadoop_20191117164242_c4ad4e93-7357-46a4-a59e-20d8e66bb662): alter table student_partition_pa add partition(month='12')
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117164242_c4ad4e93-7357-46a4-a59e-20d8e66bb662); Time taken: 0.051 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117164242_c4ad4e93-7357-46a4-a59e-20d8e66bb662): alter table student_partition_pa add partition(month='12')
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hadoop_20191117164242_c4ad4e93-7357-46a4-a59e-20d8e66bb662); Time taken: 0.116 seconds
INFO  : OK

Query data, ok

0: jdbc:hive2://node01:10000> select id,name,year,gender,month from student_partition_pa;
INFO  : Compiling command(queryId=hadoop_20191117170101_3b98c6e3-9756-4e54-b5e9-d3361351bceb): select id,name,year,gender,month from student_partition_pa
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:string, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:year, type:string, comment:null), FieldSchema(name:gender, type:string, comment:null), FieldSchema(name:month, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117170101_3b98c6e3-9756-4e54-b5e9-d3361351bceb); Time taken: 0.092 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117170101_3b98c6e3-9756-4e54-b5e9-d3361351bceb): select id,name,year,gender,month from student_partition_pa
INFO  : Completed executing command(queryId=hadoop_20191117170101_3b98c6e3-9756-4e54-b5e9-d3361351bceb); Time taken: 0.001 seconds
INFO  : OK
+-----+-------+-------------+---------+--------+--+
| id  | name  |    year     | gender  | month  |
+-----+-------+-------------+---------+--------+--+
| 01  | 赵雷    | 1990-01-01  | 男       | 12     |
| 02  | 钱电    | 1990-12-21  | 男       | 12     |
| 03  | 孙风    | 1990-05-20  | 男       | 12     |
| 04  | 李云    | 1990-08-06  | 男       | 12     |
| 05  | 周梅    | 1991-12-01  | 女       | 12     |
| 06  | 吴兰    | 1992-03-01  | 女       | 12     |
| 07  | 郑竹    | 1989-07-01  | 女       | 12     |
| 08  | 王菊    | 1990-01-20  | 女       | 12     |
+-----+-------+-------------+---------+--------+--+
8 rows selected (0.162 seconds)

Method 3 load data inpath 'hdfs file path' into table table partition (col name = 'xxx')

Data uploaded to hdfs

[hadoop@node01 /kkb/install/hivedatas]$ hdfs dfs -ls /
19/11/17 17:21:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 13 items
# 上传person.txt到hdfs
-rw-r--r--   3 hadoop      supergroup         68 2019-11-17 17:09 /person.txt

Create Table

0: jdbc:hive2://node01:10000> create table person_partition(name string,citys array<string>) partitioned by(age string) row format delimited fields terminated by '\t' collection items terminated by ',' location '/persondatas';
INFO  : Compiling command(queryId=hadoop_20191117171313_ce57a983-f4c2-4147-a94b-0c91ae143666): create table person_partition(name string,citys array<string>) partitioned by(age string) row format delimited fields terminated by '\t' collection items terminated by ',' location '/persondatas'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117171313_ce57a983-f4c2-4147-a94b-0c91ae143666); Time taken: 0.023 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117171313_ce57a983-f4c2-4147-a94b-0c91ae143666): create table person_partition(name string,citys array<string>) partitioned by(age string) row format delimited fields terminated by '\t' collection items terminated by ',' location '/persondatas'
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hadoop_20191117171313_ce57a983-f4c2-4147-a94b-0c91ae143666); Time taken: 0.101 seconds
INFO  : OK

Hdfs to load the file into the directory partition

0: jdbc:hive2://node01:10000> load data inpath '/person.txt' into table person_partition partition(age='25');
INFO  : Compiling command(queryId=hadoop_20191117172222_1f130af4-c5bd-465c-8720-4a0a32273f81): load data inpath '/person.txt' into table person_partition partition(age='25')
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117172222_1f130af4-c5bd-465c-8720-4a0a32273f81); Time taken: 0.082 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117172222_1f130af4-c5bd-465c-8720-4a0a32273f81): load data inpath '/person.txt' into table person_partition partition(age='25')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table myhive.person_partition partition (age=25) from hdfs://node01:8020/person.txt
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Partition myhive.person_partition{age=25} stats: [numFiles=1, numRows=0, totalSize=68, rawDataSize=0]
INFO  : Completed executing command(queryId=hadoop_20191117172222_1f130af4-c5bd-465c-8720-4a0a32273f81); Time taken: 0.382 seconds
INFO  : OK

Query data, ok

0: jdbc:hive2://node01:10000> select * from person_partition;
INFO  : Compiling command(queryId=hadoop_20191117172222_24fff923-f365-48ae-a9bf-02fdee10b392): select * from person_partition
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:person_partition.name, type:string, comment:null), FieldSchema(name:person_partition.citys, type:array<string>, comment:null), FieldSchema(name:person_partition.age, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20191117172222_24fff923-f365-48ae-a9bf-02fdee10b392); Time taken: 0.099 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20191117172222_24fff923-f365-48ae-a9bf-02fdee10b392): select * from person_partition
INFO  : Completed executing command(queryId=hadoop_20191117172222_24fff923-f365-48ae-a9bf-02fdee10b392); Time taken: 0.001 seconds
INFO  : OK
+------------------------+----------------------------------------------+-----------------------+--+
| person_partition.name  |            person_partition.citys            | person_partition.age  |
+------------------------+----------------------------------------------+-----------------------+--+
| yang                   | ["beijing","shanghai","tianjin","hangzhou"]  | 25                    |
| messi                  | ["changchu","chengdu","wuhan"]               | 25                    |
+------------------------+----------------------------------------------+-----------------------+--+

Are divided barrel table data can be imported via direct load?

The bucket hash table needs to be taken over a field and then split data saved as the file to a different hdfs, field values ​​need to be calculated by an ordinary resolution intermediate table, it can not be directly introduced into the load, introduced directly into only one file hdfs . Further can be seen from the table of the file type of the tub, it is not the original format, the file is a mr calculated, and therefore can not be described directly introduced hdfs.

hive in the district can improve query performance, partition is better, and why?

hive query essentially MapReduce task execution, if the partition too much, the amount of data the same body will produce more small file block block, more metadata (block location, size and other information) will be produced, so that the namenode is a lot of pressure.

In addition hive sql will be converted into mapreduce task, a small partition of the file will correspond to a task, a task corresponds to a JVM instance, too many partition will generate a lot of JVM instances, lead to frequent JVM creation and destruction, reducing overall system performance.

Reference Hirofumi:
(1) Https://Www.Cnblogs.Com/tele-share/p/9829515.Html

Guess you like

Origin www.cnblogs.com/youngchaolin/p/11877986.html