版权声明:本文为博主原创文章,转载请说明出处 https://blog.csdn.net/u010002184/article/details/89605107
以下实例都是从本地导入:
hive> load data local inpath 'sales_info.txt' overwrite into table sales_info partition(dt='2019-04-26');
实例1:
建表语句(不带分区字段,用逗号分隔):
CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id',
`sku_name` string COMMENT '商品名称',
`category_id3` string COMMENT '三级分类id',
`price` double COMMENT '销售价格',
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
NULL DEFINED AS ''
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'
文本内容:
[abc]$ cat sales_info.txt
123,华为Mate10,31,999,20
456,华为Mate30,31,2999,30
789,小米5,31,800,20
1235,小米6,31,900,100
4562,OPPO Findx,31,3900,50
[abc]$
导入的命令:(从本地导入。此时只写了文件名。如果是从该文件所在的目录直接输入hive切换到hive环境,则写不写绝对路径均可以,如果不是,则需要写文件的绝对路径,比如:'/home/abc/sales_info.txt'。sales_info.txt是否是可执行权限均可以。)
hive> load data local inpath 'sales_info.txt' overwrite into table sales_info ;
Loading data to table gdm.sales_info
Table gdm.sales_info stats: [numFiles=1, numRows=0, totalSize=133, rawDataSize=0]
OK
Time taken: 0.347 seconds
结果:
hive> select * from sales_info ;
OK
123 华为Mate10 31 999.0 20
456 华为Mate30 31 2999.0 30
789 小米5 31 800.0 20
1235 小米6 31 900.0 100
4562 OPPO Findx 31 3900.0 50
Time taken: 0.038 seconds, Fetched: 5 row(s)
hive>
结果:
hive> dfs -ls hdfs://ns1/abc/sales_info;
Found 1 items
-rwxr-xr-x 3 a a 133 2019-04-27 16:33 hdfs://ns1/abc/sales_info/sales_info.txt
hive>
hive> set hive.cli.print.header=true;
hive> select * from sales_info ;
OK
sku_id sku_name category_id3 price sales_count
123 华为Mate10 31 999.0 20
456 华为Mate30 31 2999.0 30
789 小米5 31 800.0 20
1235 小米6 31 900.0 100
4562 OPPO Findx 31 3900.0 50
Time taken: 0.043 seconds, Fetched: 5 row(s)
hive>
实例2:
建表语句(带分区字段,用逗号分隔字段):
CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id',
`sku_name` string COMMENT '商品名称',
`category_id3` string COMMENT '三级分类id',
`price` double COMMENT '销售价格',
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
PARTITIONED BY(
`dt` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
NULL DEFINED AS ''
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'
文本内容(不带分区数据):
[abc]$ cat sales_info.txt
123,华为Mate10,31,999,20
456,华为Mate30,31,2999,30
789,小米5,31,800,20
1235,小米6,31,900,100
4562,OPPO Findx,31,3900,50
[abc]$
导入命令:
hive> load data local inpath 'sales_info.txt' overwrite into table sales_info ;
FAILED: SemanticException [Error 10062]: Need to specify partition columns because the destination table is partitioned
hive> load data local inpath 'sales_info.txt' overwrite into table sales_info partition(dt='2019-04-26') ;
Loading data to table gdm.sales_info partition (dt=2019-04-26)
Partition gdm.sales_info{dt=2019-04-26} stats: [numFiles=1, numRows=0, totalSize=133, rawDataSize=0]
OK
Time taken: 0.417 seconds
查询结果:
hive> select * from sales_info;
OK
sku_id sku_name category_id3 price sales_count dt
123 华为Mate10 31 999.0 20 2019-04-26
456 华为Mate30 31 2999.0 30 2019-04-26
789 小米5 31 800.0 20 2019-04-26
1235 小米6 31 900.0 100 2019-04-26
4562 OPPO Findx 31 3900.0 50 2019-04-26
Time taken: 0.049 seconds, Fetched: 5 row(s)
hive>
文本内容(带分区数据):
[abc]$ cat sales_info.txt
123,华为Mate10,31,999,20,2019-04-26
456,华为Mate30,31,2999,30,2019-04-26
456,华为Mate30,31,2999,30,2019-04-26
789,小米5,31,800,20,2019-04-26
1235,小米6,31,900,100,2019-04-26
4562,OPPO Findx,31,3900,50,2019-04-26
[abc]$
导入、查询:
hive> load data local inpath 'sales_info.txt' overwrite into table sales_info partition(dt='2019-04-26') ;
Loading data to table gdm.sales_info partition (dt=2019-04-26)
Moved: 'hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt' to trash at: hdfs://ns1/abc/.Trash/Current
Partition gdm.sales_info{dt=2019-04-26} stats: [numFiles=1, numRows=0, totalSize=228, rawDataSize=0]
OK
Time taken: 0.457 seconds
hive> select * from sales_info;
OK
sku_id sku_name category_id3 price sales_count dt
123 华为Mate10 31 999.0 20 2019-04-26
456 华为Mate30 31 2999.0 30 2019-04-26
456 华为Mate30 31 2999.0 30 2019-04-26
789 小米5 31 800.0 20 2019-04-26
1235 小米6 31 900.0 100 2019-04-26
4562 OPPO Findx 31 3900.0 50 2019-04-26
Time taken: 0.046 seconds, Fetched: 6 row(s)
hive>
实例3:
建表语句(带分区字段,用'\t'分隔字段):
CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id',
`sku_name` string COMMENT '商品名称',
`category_id3` string COMMENT '三级分类id',
`price` double COMMENT '销售价格',
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
PARTITIONED BY(
`dt` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'
数据文本(第一种):
[abc]$ cat sales_info.txt
123\t华为Mate10\t31\t999\t20
456\t华为Mate30\t31\t2999\t30
789\t小米5\t31\t800\t20
1235\t小米6\t31\t900\t100
4562\tOPPO Findx\t31\t3900\t50
[abc]$
导入后查询:
hive> select * from sales_info;
OK
sku_id sku_name category_id3 price sales_count dt
123\t华为Mate10\t31\t999\t20 NULL NULL NULL NULL 2019-04-26
456\t华为Mate30\t31\t2999\t30 NULL NULL NULL NULL 2019-04-26
789\t小米5\t31\t800\t20 NULL NULL NULL NULL 2019-04-26
1235\t小米6\t31\t900\t100 NULL NULL NULL NULL 2019-04-26
4562\tOPPO Findx\t31\t3900\t50 NULL NULL NULL NULL 2019-04-26
Time taken: 0.89 seconds, Fetched: 5 row(s)
hive>
数据文本(第二种,用Tab键分隔字段):
导入后查询:
hive> select * from sales_info;
OK
sku_id sku_name category_id3 price sales_count dt
123 华为Mate10 31 999 20 NULL NULL NULL NULL 2019-04-26
456 华为Mate30 31 2999 30 NULL NULL NULL NULL 2019-04-26
456 华为Mate30 31 2999 30 NULL NULL NULL NULL 2019-04-26
789 小米5 31 800 20 NULL NULL NULL NULL 2019-04-26
1235 小米6 1 900 100 NULL NULL NULL NULL 2019-04-26
4562 OPPO Findx 31 900 50 NULL NULL NULL NULL 2019-04-26
Time taken: 0.049 seconds, Fetched: 6 row(s)
hive>
都是错误的,为啥呢?
存放数据文件的目录:
hive> dfs -ls hdfs://ns1/abc/sales_info/dt=2019-04-26;
Found 1 items
-rwxr-xr-x 3 a a 187 2019-04-27 17:08 hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt
hive>
注意:
1 建表语句中,'STORED AS TEXTFILE' 很重要(select * 才能看到结果),如果是lzo或者orc压缩格式,select * 没结果,但是select count(*) 正确,而且hdfs下的目录也有该文件,
比如:
CREATE TABLE `sales_info`(
...
)
COMMENT '商品销售信息表'
PARTITIONED BY (
`dt` string )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS INPUTFORMAT
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://ns1/abc/sales_info'
2 导入分区表时,分区字段是否写在数据文件中均可以。