HIVE数据导入 - load

 Filepath - 绝对路径/相对路径/ui

LOCAL: 本地文件系统 (当前系统)

如果不用LOCAL就用全UI

core-site ->namenode url

OVERWRITE 将原先表覆盖

例子:

把id.txt放到hdfs上

[root@bigdata Documents]# pwd
/home/admin/Documents
[root@bigdata Documents]# ls
id.txt
[root@bigdata Documents]# hdfs dfs -ls /user/hive
Found 1 items
drwxr-xr-x   - root staff          0 2020-05-17 17:46 /user/hive/warehouse
[root@bigdata Documents]# hdfs dfs -put id.txt /user/hive
[root@bigdata Documents]# hdfs dfs -ls /user/hive
Found 2 items
-rw-r--r--   1 root staff         17 2020-05-19 15:13 /user/hive/id.txt
drwxr-xr-x   - root staff          0 2020-05-17 17:46 /user/hive/warehouse

如果hive原来有t表,那么可以用如下删除
hdfs dfs -rm -R /user/hive/warehouse/t/*

hive> load data local inpath '/home/admin/Documents/id.txt' into table t;
Loading data to table default.t
OK
Time taken: 1.137 seconds
hive> select * from t ;
OK
1
2
3
4
5
6
7
8
NULL
Time taken: 0.38 seconds, Fetched: 9 row(s)
hive> load data inpath 'file:///home/admin/Documents/id.txt' into table t;
Loading data to table default.t
OK
Time taken: 1.444 seconds
hive> select * from t;
OK
1
2
3
4
5
6
7
8
NULL
1
2
3
4
5
6
7
8
NULL
Time taken: 0.342 seconds, Fetched: 18 row(s)

发现执行了load data inpath 'file:///命令后,id.txt会被移除

[root@bigdata Documents]# ls
id_copy.txt  id.txt
[root@bigdata Documents]# ls
id_copy.txt

而执行load data local inpath则不会移除源文件id.txt。

[root@bigdata Documents]# ls
id_copy.txt  id.txt

例子3:重新vim id.txt,使用overwrite覆盖 表t

hive> load data inpath 'file:///home/admin/Documents/id.txt' overwrite into table t;
Loading data to table default.t
OK
Time taken: 1.345 seconds
hive> select * from t;
OK
1
2
3
4
被删除的文件,例如下:

[root@bigdata Documents]# hdfs dfs -ls /user/root/.Trash/Current/user/hive/warehouse/t
Found 5 items
-rwxr-xr-x 1 root staff 17 2020-05-19 15:24 /user/root/.Trash/Current/user/hive/warehouse/t/id.txt
-rwxr-xr-x 1 root staff 8 2020-05-19 15:35 /user/root/.Trash/Current/user/hive/warehouse/t/id.txt1589874897302
-rwxr-xr-x 1 root staff 17 2020-05-19 15:26 /user/root/.Trash/Current/user/hive/warehouse/t/id_copy_1.txt
-rwxr-xr-x 1 root staff 8 2020-05-19 15:31 /user/root/.Trash/Current/user/hive/warehouse/t/id_copy_2.txt
-rwxr-xr-x 1 root staff 8 2020-05-19 15:34 /user/root/.Trash/Current/user/hive/warehouse/t/id_copy_3.txt

 

Load导入也可以是文件夹中所有文件的导入,例如,我们在/home/admin/Documents/下,新建2个文件id.txt(1~5) 和id_copy.txt(1~4)

例子1:
hive> load data inpath 'file:///home/admin/Documents/' overwrite into table t; Loading data to table default.t OK Time taken: 1.279 seconds hive> select * from t; OK 1 2 3 4 5 1 2 3 4 Time taken: 0.266 seconds, Fetched: 9 row(s)

[root@bigdata Documents]# ls
[root@bigdata Documents]#

然后你会发现/home/admin/Documents/这个文件夹也被删除了。

例子2:新建vim id.txt和id_copy.txt(1~3)使用local来overwrite

hive> load data local inpath '/home/admin/Documents/' overwrite into table t;
Loading data to table default.t
OK
Time taken: 1.265 seconds
hive> ;
hive> select * from t
> ;
OK
1
2
3
1
2
3
Time taken: 0.239 seconds, Fetched: 6 row(s)

[root@bigdata Documents]# ls
id_copy.txt id.txt

 FilePath下不能含子目录:

例子:我们在Documents/下创建了tmp/tmp1, tmp/下有文件test.txt

[root@bigdata Documents]# ls -pR
.:
id_copy.txt id.txt tmp/

./tmp:
test.txt tmp1/

./tmp/tmp1:
[root@bigdata Documents]#

列出所有子文件 : 备注:- p 在目录后面加一个“/”。- R 递归式地显示指定目录的各个子目录中的文件。

hive> load data inpath 'file:///home/admin/Documents/tmp/' overwrite into table t;
FAILED: SemanticException Line 1:17 Invalid path ''file:///home/admin/Documents/tmp/'': source contains directory: file:/home/admin/Documents/tmp/tmp1

 关于文件导入的格式匹配问题:

例子1:创建t2表,并导入数据。

hive> create table t2(id int);

列数不匹配(2列导入1列表),t2表导入后,值为NULL.

[root@bigdata Documents]# cat test.txt 
1,2
2,4

  hive> load data local inpath '/home/admin/Documents/test.txt' overwrite into table t2

hive> select * from t2;
OK
NULL
NULL
Time taken: 0.423 seconds, Fetched: 2 row(s)

例子2:字段属性不匹配 string-int,导致‘我’-》NULL

[root@bigdata Documents]# cat test2.txt 
1
24

  hive> load data local inpath '/home/admin/Documents/test2.txt' overwrite into table t

hive> select * from t2;
OK
1
2
NULL
4
Time taken: 0.209 seconds, Fetched: 4 row(s)

 例子3:修正列属性,int改为string

hive> alter table t2 change column id id string;
OK
Time taken: 0.34 seconds
hive> desc t2;
OK
id                      string                                      
Time taken: 0.077 seconds, Fetched: 1 row(s)
hive> load data local inpath '/home/admin/Documents/test2.txt' overwrite into table t2;
Loading data to table default.t2
OK
Time taken: 0.927 seconds
hive> select * from t2;
OK
1
24
Time taken: 0.173 seconds, Fetched: 4 row(s)

若转载,请注明此文博客园链接。

猜你喜欢

转载自www.cnblogs.com/watermarks/p/12917686.html
今日推荐