hive把本地的数据文件导入到表

以下实例都是从本地导入：

hive> load data local inpath 'sales_info.txt' overwrite into table sales_info partition(dt='2019-04-26');

实例1：

建表语句（不带分区字段，用逗号分隔）：

CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id', 
`sku_name` string COMMENT '商品名称', 
`category_id3` string COMMENT '三级分类id', 
`price` double COMMENT '销售价格', 
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
NULL DEFINED AS '' 
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'

文本内容：

[abc]$ cat sales_info.txt 
123,华为Mate10,31,999,20
456,华为Mate30,31,2999,30
789,小米5,31,800,20
1235,小米6,31,900,100
4562,OPPO Findx,31,3900,50
[abc]$

导入的命令：（从本地导入。此时只写了文件名。如果是从该文件所在的目录直接输入hive切换到hive环境，则写不写绝对路径均可以，如果不是，则需要写文件的绝对路径，比如：'/home/abc/sales_info.txt'。sales_info.txt是否是可执行权限均可以。）

hive> load data local inpath 'sales_info.txt' overwrite into table sales_info ;
Loading data to table gdm.sales_info
Table gdm.sales_info stats: [numFiles=1, numRows=0, totalSize=133, rawDataSize=0]
OK
Time taken: 0.347 seconds

结果：

hive> select * from sales_info ;
OK
123	华为Mate10	31	999.0	20
456	华为Mate30	31	2999.0	30
789	小米5	31	800.0	20
1235	小米6	31	900.0	100
4562	OPPO Findx	31	3900.0	50
Time taken: 0.038 seconds, Fetched: 5 row(s)
hive>

结果：

hive> dfs  -ls hdfs://ns1/abc/sales_info;
Found 1 items
-rwxr-xr-x   3 a a       133 2019-04-27 16:33 hdfs://ns1/abc/sales_info/sales_info.txt
hive>

hive> set hive.cli.print.header=true; 
hive> select * from sales_info ;
OK
sku_id	sku_name	category_id3	price	sales_count
123	华为Mate10	31	999.0	20
456	华为Mate30	31	2999.0	30
789	小米5	31	800.0	20
1235	小米6	31	900.0	100
4562	OPPO Findx	31	3900.0	50
Time taken: 0.043 seconds, Fetched: 5 row(s)
hive>

实例2：

建表语句（带分区字段，用逗号分隔字段）：

CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id', 
`sku_name` string COMMENT '商品名称', 
`category_id3` string COMMENT '三级分类id', 
`price` double COMMENT '销售价格', 
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
PARTITIONED BY(
`dt` string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
NULL DEFINED AS '' 
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'

文本内容（不带分区数据）：

[abc]$ cat sales_info.txt 
123,华为Mate10,31,999,20
456,华为Mate30,31,2999,30
789,小米5,31,800,20
1235,小米6,31,900,100
4562,OPPO Findx,31,3900,50
[abc]$

导入命令：

hive> load data local inpath 'sales_info.txt' overwrite into table sales_info ;
FAILED: SemanticException [Error 10062]: Need to specify partition columns because the destination table is partitioned
hive> load data local inpath 'sales_info.txt' overwrite into table sales_info partition(dt='2019-04-26') ;
Loading data to table gdm.sales_info partition (dt=2019-04-26)
Partition gdm.sales_info{dt=2019-04-26} stats: [numFiles=1, numRows=0, totalSize=133, rawDataSize=0]
OK
Time taken: 0.417 seconds

查询结果：

hive> select *  from sales_info;
OK
sku_id	sku_name	category_id3	price	sales_count	dt
123	华为Mate10	31	999.0	20	2019-04-26
456	华为Mate30	31	2999.0	30	2019-04-26
789	小米5	31	800.0	20	2019-04-26
1235	小米6	31	900.0	100	2019-04-26
4562	OPPO Findx	31	3900.0	50	2019-04-26
Time taken: 0.049 seconds, Fetched: 5 row(s)
hive>

文本内容（带分区数据）：

[abc]$ cat sales_info.txt 
123,华为Mate10,31,999,20,2019-04-26
456,华为Mate30,31,2999,30,2019-04-26
456,华为Mate30,31,2999,30,2019-04-26
789,小米5,31,800,20,2019-04-26
1235,小米6,31,900,100,2019-04-26
4562,OPPO Findx,31,3900,50,2019-04-26
[abc]$

导入、查询：

hive> load data local inpath 'sales_info.txt' overwrite into table sales_info partition(dt='2019-04-26') ;
Loading data to table gdm.sales_info partition (dt=2019-04-26)
Moved: 'hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt' to trash at: hdfs://ns1/abc/.Trash/Current
Partition gdm.sales_info{dt=2019-04-26} stats: [numFiles=1, numRows=0, totalSize=228, rawDataSize=0]
OK
Time taken: 0.457 seconds
hive> select *  from sales_info;
OK
sku_id	sku_name	category_id3	price	sales_count	dt
123	华为Mate10	31	999.0	20	2019-04-26
456	华为Mate30	31	2999.0	30	2019-04-26
456	华为Mate30	31	2999.0	30	2019-04-26
789	小米5	31	800.0	20	2019-04-26
1235	小米6	31	900.0	100	2019-04-26
4562	OPPO Findx	31	3900.0	50	2019-04-26
Time taken: 0.046 seconds, Fetched: 6 row(s)
hive>

实例3：

建表语句（带分区字段,用'\t'分隔字段）：

CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id', 
`sku_name` string COMMENT '商品名称', 
`category_id3` string COMMENT '三级分类id', 
`price` double COMMENT '销售价格', 
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
PARTITIONED BY(
`dt` string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' 
NULL DEFINED AS '' 
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'

数据文本（第一种）：

[abc]$ cat sales_info.txt 
123\t华为Mate10\t31\t999\t20
456\t华为Mate30\t31\t2999\t30
789\t小米5\t31\t800\t20
1235\t小米6\t31\t900\t100
4562\tOPPO Findx\t31\t3900\t50
[abc]$

导入后查询：

hive> select  *  from sales_info;
OK
sku_id	sku_name	category_id3	price	sales_count	dt
123\t华为Mate10\t31\t999\t20	NULL	NULL	NULL	NULL	2019-04-26
456\t华为Mate30\t31\t2999\t30	NULL	NULL	NULL	NULL	2019-04-26
789\t小米5\t31\t800\t20	NULL	NULL	NULL	NULL	2019-04-26
1235\t小米6\t31\t900\t100	NULL	NULL	NULL	NULL	2019-04-26
4562\tOPPO Findx\t31\t3900\t50	NULL	NULL	NULL	NULL	2019-04-26
Time taken: 0.89 seconds, Fetched: 5 row(s)
hive>

数据文本（第二种，用Tab键分隔字段）：

导入后查询：

hive> select  *  from sales_info;
OK
sku_id	sku_name	category_id3	price	sales_count	dt
123 华为Mate10  31  999 20	NULL	NULL	NULL	NULL	2019-04-26
456 华为Mate30  31  2999    30	NULL	NULL	NULL	NULL	2019-04-26
456 华为Mate30  31  2999    30	NULL	NULL	NULL	NULL	2019-04-26
789 小米5   31  800 20	NULL	NULL	NULL	NULL	2019-04-26
1235    小米6   1   900 100	NULL	NULL	NULL	NULL	2019-04-26
4562    OPPO Findx  31  900 50	NULL	NULL	NULL	NULL	2019-04-26
Time taken: 0.049 seconds, Fetched: 6 row(s)
hive>

都是错误的，为啥呢？

存放数据文件的目录：

hive> dfs -ls hdfs://ns1/abc/sales_info/dt=2019-04-26;
Found 1 items
-rwxr-xr-x   3 a a        187 2019-04-27 17:08 hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt
hive>

注意：

1 建表语句中，'STORED AS TEXTFILE' 很重要（select * 才能看到结果），如果是lzo或者orc压缩格式，select * 没结果，但是select count(*) 正确，而且hdfs下的目录也有该文件，

比如：

CREATE TABLE `sales_info`(
  ...
  )
  COMMENT '商品销售信息表'
PARTITIONED BY ( 
  `dt` string )
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
  NULL DEFINED AS '' 
STORED AS INPUTFORMAT 
  'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://ns1/abc/sales_info'

2 导入分区表时，分区字段是否写在数据文件中均可以。

hive把本地的数据文件导入到表

猜你喜欢