Error al ejecutar el archivo de informe hsql

Hola a todos:

Al ejecutar hsql hoy, encontré un error de archivo faltante. El siguiente es el proceso de prueba y las soluciones, espero que sea de utilidad para todos.

---- Crear tabla de prueba

create table employ_test(
employ_id  BIGINT comment '员工编码',
salary DECIMAL(20,2) COMMENT '员工薪水'
)
comment '员工信息测试表,测试删除分区文件' 
PARTITIONED BY (dept_no STRING comment '部门号');

--- Insertar datos de prueba

insert into employ_test PARTITION (dept_no = '11')  values('11101',88);
insert into employ_test PARTITION (dept_no = '11')  values('11102',89);
insert into employ_test PARTITION (dept_no = '11')  values('11101',90);
insert into employ_test PARTITION (dept_no = '22')  values('22201',88);

- verificar los datos

hive> select * from employ_test;
OK
11101 88 11
11102 89 11
11101 90 11
22201 88 22

-Ver el número de particiones de la tabla

hive> show partitions employ_test;
OK
dept_no=11
dept_no=22

--- Verificar declaración sql

hive> select dept_no,sum(salary) as salary from employ_test group by dept_no;
此次忽略中间的转换为mr的过程
11 267
22 88
Time taken: 6.054 seconds, Fetched: 2 row(s)

--- Ver archivos de datos en hdfs

[bxapp@bzcrkmfx0ap1001 ~]$ hadoop fs -ls /user/testBT/dbc/employ_test
Found 2 items
drwx------   - testBT hdfs          0 2017-11-27 09:53 /user/testBT/dbc/employ_test/dept_no=11
drwx------   - testBT hdfs          0 2017-11-27 10:13 /user/testBT/dbc/employ_test/dept_no=22

--- Eliminar manualmente archivos de datos en hadfs

[bxapp@bzcrkmfx0ap1001 ~]$ hadoop fs -rm -r /user/testBT/dbc/employ_test/dept_no=22
17/11/27 10:14:13 INFO fs.TrashPolicyDefault: Moved: 'hdfs://cluster/user/testBT/dbc/employ_test/dept_no=22' to trash at: hdfs://cluster/user/testBT/.Trash/Current/user/testBT/dbc/employ_test/dept_no=221511748853212
[bxapp@bzcrkmfx0ap1001 ~]$ hadoop fs -ls /user/testBT/dbc/employ_test
Found 1 items
drwx------   - testBT hdfs          0 2017-11-27 09:53 /user/testBT/dbc/employ_test/dept_no=11

--- Verifique la información de los datos originales en la tabla nuevamente

hive> show partitions employ_test;
OK
dept_no=11
dept_no=22
Time taken: 0.56 seconds, Fetched: 2 row(s)

Los resultados ilustran completamente un punto. Después de eliminar manualmente el archivo de datos en hadfs, la información de metadatos en la tabla no se modifica

--- ejecute la declaración hql nuevamente

hive> select dept_no,sum(salary) as salary from employ_test group by dept_no;
Query ID = bxapp_20171127101701_bfc89d7a-0ec0-41bd-a215-c0bf58c9a7a1
Total jobs = 1
Launching Job 1 out of 1

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED     -1          0        0       -1       0       0
Reducer 2             KILLED      2          0        0        2       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 1511749120.00 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1509332682299_65302_2_00, diagnostics=[Vertex vertex_1509332682299_65302_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: employ_test initializer failed, vertex=vertex_1509332682299_65302_2_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://cluster/user/testBT/dbc/employ_test/dept_no=22
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)

Puede ver que se muestra el error de que el archivo de datos no existe,

esta vez solo se ejecutan los datos con la partición y se pueden ejecutar

hive> select dept_no,sum(salary) as salary from employ_test where dept_no='11' group by dept_no;
Query ID = bxapp_20171127101918_f5e6f62c-5c09-47c1-b2e8-85b02307874c
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1509332682299_65302)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 4.28 s     
--------------------------------------------------------------------------------
OK
11 267
Time taken: 5.805 seconds, Fetched: 1 row(s)

--- Eliminar manualmente la información de la partición en los metadatos

hive> alter table employ_test drop partition (dept_no = '22');
OK
Time taken: 0.457 seconds
hive> show partitions employ_test;
OK
dept_no=11
Time taken: 0.456 seconds, Fetched: 1 row(s)

--- Ejecute la instrucción hql nuevamente

hive> select dept_no,sum(salary) as salary from employ_test group by dept_no;
Query ID = bxapp_20171127102150_12f76029-edf4-437f-9db7-84ea2bf6f2d8
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1509332682299_65302)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 3.85 s     
--------------------------------------------------------------------------------
OK
11 267
Time taken: 6.719 seconds, Fetched: 1 row(s)

Los experimentos han demostrado que después de eliminar el archivo de datos de la colmena en hdfs, la información de metadatos no se actualizará sincrónicamente. Debe eliminar manualmente la información de la partición en los metadatos; de lo contrario, se informará un error al consultar particiones inexistentes

Error al ejecutar el archivo de informe hsql

Supongo que te gusta