Execute hsql report file missing error

Hello everyone:

   When executing hsql today, I found a file missing error. The following is the test process and solutions, I hope it will be useful to everyone.

 

----Create test table

create table employ_test(
employ_id  BIGINT comment '员工编码',
salary DECIMAL(20,2) COMMENT '员工薪水'
)
comment '员工信息测试表,测试删除分区文件' 
PARTITIONED BY (dept_no STRING comment '部门号');




---Insert test data

insert into employ_test PARTITION (dept_no = '11')  values('11101',88);
insert into employ_test PARTITION (dept_no = '11')  values('11102',89);
insert into employ_test PARTITION (dept_no = '11')  values('11101',90);
insert into employ_test PARTITION (dept_no = '22')  values('22201',88);




--verify the data

hive> select * from employ_test;
OK
11101 88 11
11102 89 11
11101 90 11
22201 88 22




-View the number of partitions of the table

hive> show partitions employ_test;
OK
dept_no=11
dept_no=22




---Verify sql statement

hive> select dept_no,sum(salary) as salary from employ_test group by dept_no;
此次忽略中间的转换为mr的过程
11 267
22 88
Time taken: 6.054 seconds, Fetched: 2 row(s)


---View data files on hdfs

[bxapp@bzcrkmfx0ap1001 ~]$ hadoop fs -ls /user/testBT/dbc/employ_test
Found 2 items
drwx------   - testBT hdfs          0 2017-11-27 09:53 /user/testBT/dbc/employ_test/dept_no=11
drwx------   - testBT hdfs          0 2017-11-27 10:13 /user/testBT/dbc/employ_test/dept_no=22



--- Manually delete data files on hadfs

[bxapp@bzcrkmfx0ap1001 ~]$ hadoop fs -rm -r /user/testBT/dbc/employ_test/dept_no=22
17/11/27 10:14:13 INFO fs.TrashPolicyDefault: Moved: 'hdfs://cluster/user/testBT/dbc/employ_test/dept_no=22' to trash at: hdfs://cluster/user/testBT/.Trash/Current/user/testBT/dbc/employ_test/dept_no=221511748853212
[bxapp@bzcrkmfx0ap1001 ~]$ hadoop fs -ls /user/testBT/dbc/employ_test
Found 1 items
drwx------   - testBT hdfs          0 2017-11-27 09:53 /user/testBT/dbc/employ_test/dept_no=11


---Check the original data information in the table again

hive> show partitions employ_test;
OK
dept_no=11
dept_no=22
Time taken: 0.56 seconds, Fetched: 2 row(s)


The results fully illustrate one point. After manually deleting the data file on hadfs, the metadata information in the table is not modified


---execute the hql statement again

hive> select dept_no,sum(salary) as salary from employ_test group by dept_no;
Query ID = bxapp_20171127101701_bfc89d7a-0ec0-41bd-a215-c0bf58c9a7a1
Total jobs = 1
Launching Job 1 out of 1

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED     -1          0        0       -1       0       0
Reducer 2             KILLED      2          0        0        2       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 1511749120.00 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1509332682299_65302_2_00, diagnostics=[Vertex vertex_1509332682299_65302_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: employ_test initializer failed, vertex=vertex_1509332682299_65302_2_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://cluster/user/testBT/dbc/employ_test/dept_no=22
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)


You can see that the error that the data file does not exist is displayed-

this time only the data with partition is executed, and it can be executed

hive> select dept_no,sum(salary) as salary from employ_test where dept_no='11' group by dept_no;
Query ID = bxapp_20171127101918_f5e6f62c-5c09-47c1-b2e8-85b02307874c
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1509332682299_65302)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 4.28 s     
--------------------------------------------------------------------------------
OK
11 267
Time taken: 5.805 seconds, Fetched: 1 row(s)


--- Manually delete the partition information in the metadata

hive> alter table employ_test drop partition (dept_no = '22');
OK
Time taken: 0.457 seconds
hive> show partitions employ_test;
OK
dept_no=11
Time taken: 0.456 seconds, Fetched: 1 row(s)


---Execute the hql statement again

hive> select dept_no,sum(salary) as salary from employ_test group by dept_no;
Query ID = bxapp_20171127102150_12f76029-edf4-437f-9db7-84ea2bf6f2d8
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1509332682299_65302)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 3.85 s     
--------------------------------------------------------------------------------
OK
11 267
Time taken: 6.719 seconds, Fetched: 1 row(s)


Experiments have proved that after deleting the hive data file on hdfs, the metadata information will not be updated synchronously. You need to manually delete the partition information in the metadata, otherwise an error will be reported when querying for non-existent partitions

Guess you like

Origin blog.csdn.net/zhaoxiangchong/article/details/78643047