hive ,从hdfs把数据文件导入到表

版权声明:本文为博主原创文章,转载请说明出处 https://blog.csdn.net/u010002184/article/details/89605368

hive> load data inpath 'hdfs://ns1/abc/sales_info/hello/sales_info.txt' overwrite into table sales_info partition(dt = '2019-04-26');

原数据文件(已经不存在了,是从原路径移动到了新路径下):

建表语句:

CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id', 
`sku_name` string COMMENT '商品名称', 
`category_id3` string COMMENT '三级分类id', 
`price` double COMMENT '销售价格', 
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
PARTITIONED BY(
`dt` string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
NULL DEFINED AS '' 
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'

数据内容:

[abc]$ cat sales_info.txt 
12377,华为Mate10,31,999,20
45677,华为Mate30,31,2999,30
[abc]$ 

在hdfs新建文件夹(hello),把本地文件put到hdfs目的路径中:

hive> dfs -mkdir hdfs://ns1/abc/sales_info/hello;
hive> dfs -put sales_info.txt hdfs://ns1/abc/sales_info/hello;
hive> dfs -ls hdfs://ns1/user/mart_tha/gdm.db/sales_info/hello;
Found 1 items
-rw-r--r--   3 a a 61 2019-04-27 17:34 

导入数据(新建表后,之前导入过一次,这是第二次导入)、查询结果(有2条数据,是最新的,之前是5条数据):

hive> load data inpath 'hdfs://ns1/abc/sales_info/hello/sales_info.txt' overwrite into table sales_info partition(dt = '2019-04-26');
Loading data to table gdm.sales_info partition (dt=2019-04-26)
Moved: 'hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt' to trash at: hdfs://ns1/abc/.Trash/Current
Partition gdm.sales_info{dt=2019-04-26} stats: [numFiles=1, numRows=0, totalSize=61, rawDataSize=0]
OK
Time taken: 0.43 seconds
hive> select *  from sales_info;
OK
sku_id	sku_name	category_id3	price	sales_count	dt
12377	华为Mate10	31	999.0	20	2019-04-26
45677	华为Mate30	31	2999.0	30	2019-04-26
Time taken: 0.049 seconds, Fetched: 2 row(s)

再查看原数据文件(已经不存在了,是从原路径移动到了新路径下):

hive> dfs -ls hdfs://ns1/user/mart_tha/gdm.db/sales_info/hello;
hive> 

end

猜你喜欢

转载自blog.csdn.net/u010002184/article/details/89605368