Hive---外部分区表的创建

Hive---外部分区表的创建

(1)假设有个分区表,数据如下:

hive> show create table partition_parquet;
OK
CREATE TABLE `partition_parquet`(
  `member_id` string,
  `name` string,
  `add_item` string)
PARTITIONED BY (
  `stat_date` string,
  `province` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='\t',
  'serialization.format'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet'
TBLPROPERTIES (
  'last_modified_by'='a6',
  'last_modified_time'='1525229204',
  'transient_lastDdlTime'='1525229204')
Time taken: 0.173 seconds, Fetched: 22 row(s)

部分数据如下:

hive> SELECT * FROM partition_parquet where stat_date='20110527' and province ='liaoning';
OK
1	liujiannan	NULL	20110527	liaoning
2	wangchaoqun	NULL	20110527	liaoning
3	xuhongxing	NULL	20110527	liaoning
4	zhudaoyong	NULL	20110527	liaoning
5	zhouchengyu	NULL	20110527	liaoning

存储目录如下;

bogon:bin a6$ hadoop dfs -ls -R hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

18/06/23 19:34:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x   - a6 supergroup          0 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning
-rwxr-xr-x   1 a6 supergroup        437 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning/000000_0

(2)不好的例子——外部分区表的创建及数据导入

CREATE external TABLE `partition_external_parquet`(
  `member_id` string,
  `name` string,
  `add_item` string)
PARTITIONED BY (
  `stat_date` string,
  `province` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='\t',
  'serialization.format'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'
但是此时表中数据并没有显示,如下:

hive> SELECT * FROM partition_external_parquet;
OK
Time taken: 1.695 seconds
原因:没有加入分区
接下来我们加入分区.

hive> alter table partition_external_parquet add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning';
OK
Time taken: 0.836 seconds
在此查看数据:
hive> SELECT * FROM partition_external_parquet;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
1	liujiannan	NULL	20110527	liaoning
2	wangchaoqun	NULL	20110527	liaoning
3	xuhongxing	NULL	20110527	liaoning
4	zhudaoyong	NULL	20110527	liaoning
5	zhouchengyu	NULL	20110527	liaoning
Time taken: 1.474 seconds, Fetched: 5 row(s)

(3)良好的例子——外部分区表的创建及数据导入

hive> create external table if not exists partition_external_parquet like partition_parquet;
OK
Time taken: 0.106 seconds
hive> show create table partition_external_parquet2;
OK
CREATE EXTERNAL TABLE `partition_external_parquet2`(
  `member_id` string,
  `name` string,
  `add_item` string)
PARTITIONED BY (
  `stat_date` string,
  `province` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='\t',
  'serialization.format'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_external_parquet2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1529753081')
Time taken: 0.057 seconds, Fetched: 20 row(s)

以静态全静态分区的形式导入数据

hive>  alter table partition_external_parquet2 add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning';
OK
Time taken: 0.078 seconds
hive> select * from partition_external_parquet2;
OK
1	liujiannan	NULL	20110527	liaoning
2	wangchaoqun	NULL	20110527	liaoning
3	xuhongxing	NULL	20110527	liaoning
4	zhudaoyong	NULL	20110527	liaoning
5	zhouchengyu	NULL	20110527	liaoning
Time taken: 0.133 seconds, Fetched: 5 row(s)
参考:https://blog.csdn.net/a2011480169/article/details/51991421

猜你喜欢

转载自blog.csdn.net/helloxiaozhe/article/details/80786195