azkaban错误排查日志

azkaban错误排查日志

错误排查一:

azkaban运行kettle任务时报错:

02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 - Error updating batch
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 - Column 'en_name' cannot be null
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 - 
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.core.database.Database.createKettleDatabaseBatchException(Database.java:1425)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.core.database.Database.emptyAndCommit(Database.java:1414)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.trans.steps.tableoutput.TableOutput.dispose(TableOutput.java:586)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.trans.step.RunThread.run(RunThread.java:97)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at java.lang.Thread.run(Thread.java:748)

这个错误的原因是:运行kettle时,kettle将''自动识别为null,继续按照null插入表的时候,就报上述的错。但是我已经在kettle.properties文件中做修改(如果这里有不了解的同学,可以私聊问我),但仍然报null错,我就郁闷了。
我们知道kettle.properties这个文件在每个运行kettle的用户的主目录中都会存在,我修改的只是用户guaishou下的kettle.properties。却没有修改root用户下的kettle.properties文件【或者说该guaishou用户根本没有kettle.properties这个文件】。这时候,或许就有同学说,那你直接修改这个root用户主目录下的kettle.properties文件补救解决问题了吗?这么做是可以的,但是我们并不应该绕开这个问题。
对于每个问题,都需要思考一下,再给出行动方案。
我在guaishou的用户,在其下运行azkaban的job时,却没有抛出这个错误,这是为何?这么来看,任务都是对的,错误的原因可能起自:azkaban。但是azkaban又会是什么错呢?
原因1:azkaban以不同的身份运行这个任务,从而导致出错。与这个问题相似的也有No such directory错。
原因2:azkaban会调度任务,会将任务分配到不同的executor上执行,当有些节点没有布置这个kettle时,就会报null错,这是因为在两个azkaban可运行的机器中,没有相同的文件目录。
我们可以通过azkaban数据库查找executor的具体信息,如下:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| sys_azkaban        |
+--------------------+
2 rows in set (0.00 sec)

mysql> use sys_azkaban;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| sys_azkaban        |
+--------------------+
2 rows in set (0.00 sec)

mysql> show tables;
+--------------------------+
| Tables_in_sys_azkaban    |
+--------------------------+
| ····                     |
| executors                |
···                        |
+--------------------------+
28 rows in set (0.00 sec)

mysql> select * from executors;
+----+----------+-------+--------+
| id | host     | port  | active |
+----+----------+-------+--------+
| 12 | bi_5.109 | 12321 |      1 |
| 16 | bi_5.110 | 12321 |      1 |
+----+----------+-------+--------+
1 row in set (0.00 sec)

可以看到这个bi_5.109也是一个active的executor。这个null的问题,以及脚本找不到的问题就是源自这个executor下。因为有两个executor,但是我的执行脚本只在bi_5.110这台机器上,所以导致出现No such directory。可以查看一下出现该错的日志信息:
这里写图片描述
可以看到是在bi_5.109上执行该任务,所以报错。

错误排查二

azkaban运行kettle任务报错(错误描述:每当运行任务,就出现faild)。azkaban后台报错:

2018-08-02 15:34:04 INFO  ExecutorManager:265 - Successfully refreshed executor: bi_5.110:12321 (id: 16) with executor info : ExecutorInfo{remainingMemoryPercent=96.25209633680396, remainingMemoryInMB=30812, remainingFlowCapacity=30, numberOfAssignedFlows=0, lastDispatchedTime=1533192900505, cpuUsage=0.0}
2018-08-02 15:34:04 INFO  ExecutorManager:1813 - Using dispatcher for execution id :4222
2018-08-02 15:34:04 ERROR ExecutorManager:1392 - Rolling back executor assignment for execution id:4222
azkaban.executor.ExecutorManagerException: java.io.IOException: java.nio.file.FileSystemException: executions/4222/azkabanJob/batch101/batch101.job -> /data/software/azkaban/exec/projects/7.5/azkabanJob/batch101/batch101.job: Operation not permitted
    at azkaban.executor.ExecutorApiGateway.callWithExecutionId(ExecutorApiGateway.java:78)
    at azkaban.executor.ExecutorApiGateway.callWithExecutable(ExecutorApiGateway.java:43)
    at azkaban.executor.ExecutorManager.dispatch(ExecutorManager.java:1389)
    at azkaban.executor.ExecutorManager.access$1500(ExecutorManager.java:65)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1750)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.processQueuedFlows(ExecutorManager.java:1730)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.run(ExecutorManager.java:1668)

查看azkaban下的executor的权限,如下:

[root@bi_5 projects]# ll
total 0
drwxr-xr-x 2 guaishou guaishou  61 Jul 27 14:09 1.1
drwxrwxr-x 2 guaishou guaishou  21 Jul 27 14:09 11.1
drwxr-xr-x 2 guaishou guaishou 109 Jul 27 14:09 1.2
drwxr-xr-x 2 guaishou guaishou  19 Jul 27 14:09 1.3
drwxrwxr-x 2 guaishou guaishou  39 Aug  2 14:55 13.1
drwxrwxr-x 2 guaishou guaishou  24 Jul 27 14:09 2.1
drwxr-xr-x 2 guaishou guaishou  23 Jul 27 14:09 3.1
drwxr-xr-x 2 root     root      39 Jul 27 20:11 4.1
drwxrwxr-x 2 guaishou guaishou  24 Jul 27 14:09 5.1
drwxr-xr-x 2 root     root      43 Jul 30 15:02 6.1
drwxr-xr-x 3 root     root      23 Jul 31 15:29 7.1
drwxr-xr-x 3 root     root      23 Aug  1 17:36 7.3
drwxr-xr-x 3 root     root      23 Aug  2 10:46 7.5
drwxrwxr-x 2 guaishou guaishou  24 Jul 27 14:09 8.1
[root@bi_5 projects]# cd ..
[root@bi_5 exec]# cd ..
[root@bi_5 azkaban]# ll
total 0
drwxr-xr-x 11 guaishou guaishou 155 Aug  2 14:31 exec

修改./exec/下的所有情况的文件权限并如下所示:
[root@bi_5 azkaban]# chown guaishou.guaishou ./exec/ -R
[root@bi_5 azkaban]# cd exec/
[root@bi_5 exec]# ll
total 12
drwxr-xr-x  2 guaishou guaishou  107 Jul 27 14:09 bin
drwxr-xr-x  2 guaishou guaishou   78 Jul 27 17:35 conf
-rw-r--r--  1 guaishou guaishou    6 Aug  2 14:31 currentpid
drwxrwsr-x  3 guaishou guaishou   17 Aug  2 15:34 executions
-rw-r--r--  1 guaishou guaishou    6 Aug  2 14:31 executor.port
drwxr-xr-x  2 guaishou guaishou    6 Jul 27 14:09 extlib
drwxr-xr-x  2 guaishou guaishou 4096 Jul 27 14:09 lib
drwxr-xr-x  2 guaishou guaishou   89 Jul 27 14:09 logs
drwxr-xr-x  3 guaishou guaishou   21 Jul 27 14:09 plugins
drwxr-xr-x 16 guaishou guaishou  148 Aug  2 14:55 projects
drwxr-xr-x  2 guaishou guaishou    6 Aug  2 14:55 temp

Web UI报错页面如下所示:
这里写图片描述
这个粉红色的页面,正式因为没有权限运行这些任务导致。按照上述修改即可。

猜你喜欢

转载自blog.csdn.net/liu16659/article/details/81367312