验证hive load装载数据的overwrite参数

操作过程

hive> select count(*) from test;

2018-05-25 11:08:40,651 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 61.19 sec
MapReduce Total cumulative CPU time: 1 minutes 1 seconds 190 msec
Ended Job = job_1515037630689_0063
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 9  Reduce: 1   Cumulative CPU: 61.19 sec   HDFS Read: 820348819 HDFS Write: 107 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 1 seconds 190 msec
OK
7273391

Time taken: 462.62 seconds, Fetched: 1 row(s)


hive> load data inpath '/data/test/'  into table test;

Loading data to table test
OK
Time taken: 7.003 seconds

hive> select count(*) from mianyandns9test;

MapReduce Total cumulative CPU time: 56 seconds 140 msec
Ended Job = job_1515037630689_0064
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 9  Reduce: 1   Cumulative CPU: 56.14 sec   HDFS Read: 820348824 HDFS Write: 107 SUCCESS
Total MapReduce CPU Time Spent: 56 seconds 140 msec
OK
7273391

Time taken: 416.049 seconds, Fetched: 1 row(s)

结论:再次装载数据,对数据没有影响

hive> load data inpath '/data/test/'  overwrite into table test;
Loading data to table test
OK
Time taken: 6.97 seconds
hive> dfs -ls /data/test/;
hive> 

加入overwrite参数后 ,原来的文件消失

hive> select count(*) from test;
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
2018-05-25 14:21:37,032 Stage-1 map = 0%,  reduce = 0%
2018-05-25 14:22:13,490 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.79 sec
MapReduce Total cumulative CPU time: 1 seconds 790 msec
Ended Job = job_1515037630689_0065
MapReduce Jobs Launched: 
Stage-Stage-1: Reduce: 1   Cumulative CPU: 1.79 sec   HDFS Read: 3984 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 790 msec
OK
0

Time taken: 129.468 seconds, Fetched: 1 row(s)

记录已经清零

小结:当指定了OVERWRITE后,目标文件夹中之前存在的数据将会先被删除,所以在装载数据时需要特别小心。


猜你喜欢

转载自blog.csdn.net/lepton126/article/details/80451492
今日推荐