Oozie操作篇--Oozie Sqoop Action 配置


Oozie Sqoop Action 配置
  • Sqoop Action 用来运行sqoop 任务,流程任务必须等当前节点的sqoop任务执行完成之后才能执行后续节点任务。
  • Email Action 所有的节点值都可以使用EL表达式
  • 运行Sqoop Job,必须在 sqoop action里面配置 job-tracer,name-node,Sqoop command,也许还需要一些参数和配置。
  • 同Shell Action一样 Sqoop Action 可以配置成创建或者删除HDFS目录之后再去执行一个Sqoop任务
  • Sqoop 应用的配置可以使用job-xml文件中的元素,也可以使用内部元素来配置,像EL表达式也支持在内部元素中的配置,内部元素的配置可以覆盖外部文件中的配置,内部元素配置不能使用 Hadoop mapred.job.tracker and fs.default.name这两个属性
  • 跟mr任务一样,在Shell任务中也可以使用文件和附件具体参见【http://archive.cloudera.com/cdh/3/oozie/WorkflowFunctionalSpec.html#a3.2.2.1_Adding_Files_and_Archives_for_the_Job

Sqoop Action格式
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
< workflow-app  name = "[WF-DEF-NAME]"  xmlns = "uri:oozie:workflow:0.1" >
     ...
     < action  name = "[NODE-NAME]" >
         < sqoop  xmlns = "uri:oozie:sqoop-action:0.2" >
             < job-tracker >[JOB-TRACKER]</ job-tracker >
             < name-node >[NAME-NODE]</ name-node >
             < prepare >
                < delete  path = "[PATH]" />
                ...
                < mkdir  path = "[PATH]" />
                ...
             </ prepare >
             < configuration >
                 < property >
                     < name >[PROPERTY-NAME]</ name >
                     < value >[PROPERTY-VALUE]</ value >
                 </ property >
                 ...
             </ configuration >
             < command >[SQOOP-COMMAND]</ command >
             < arg >[SQOOP-ARGUMENT]</ arg >
             ...
             < file >[FILE-PATH]</ file >
             ...
             < archive >[FILE-PATH]</ archive >
             ...
         </ sqoop >
         < ok  to = "[NODE-NAME]" />
         < error  to = "[NODE-NAME]" />
     </ action >
     ...
</ workflow-app >
  • prepare 元素 如果存在,表明在执行sqoop 命令之前需要执行的一系列 hdfs路径的创建和删除操作,并且路径必须以  hdfs://HOST:PORT  开头
  • job-xml 元素 如果存在,则作为sqoop任务的配置文件,从 schema 0.3开始支持多个job-xml元素用来支持多个job.xml文件
  • configuration 用来给sqoop任务传递参数
sqoop command
  • sqoop 命令可以通过command元素或者多个arg元素指定
  • 当使用command的时候,oozie会根据空格把命令切分成多个参数
  • 当使用arg的时候,oozie将会把arg里面的值当成参数传递给sqoop
  • 当一个参数里面有空格的时候,必须用arg来指定
  • 上述所有的元素值都可以使用EL表达式配置
Sqoop Action 使用实例一:使用sqoop同步mysql数据,执行成功发送提示邮件
1,新建 job.properties
1
2
3
4
5
6
jobTracker=hadoop-node1.novalocal:8050
queueName=default
examplesRoot=xwj_test
jobOutput=/user/xwj/test
oozie.wf.application.path=${nameNode}/user/oozie/${examplesRoot}/apps/shell/sqoop_email/workflow.xml
2,workflow.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
< workflow-app  xmlns = "uri:oozie:workflow:0.4"  name = "email-wf" >
     < start  to = "sqoop-node" />
     
     < action  name = "sqoop-node" >
         < sqoop  xmlns = "uri:oozie:sqoop-action:0.2" >
             < job-tracker >hadoop-node1.novalocal:8050</ job-tracker >
             < name-node > hdfs://hadoop-node1.novalocal:8020 </ name-node >
             < prepare >
                 < delete  path = "${jobOutput}" />
             </ prepare >
             < configuration >
                 < property >
                     < name >mapred.compress.map.output</ name >
                     < value >true</ value >
                 </ property >
             </ configuration >
             < command >sqoop import --connect jdbc: mysql://10.166.224.183:3306/oozie --username oozie --password oozie --query  'select id,app_name,app_path,user_name from WF_JOBS where $CONDITIONS LIMIT 100' --target-dir /user/xwj/test --delete-target-dir --num-mappers 1 --fields-terminated-by '\t'</ command >
         </ sqoop >
         < ok  to = "email-node" />
         < error  to = "fail" />
     </ action >
     
     
     < action  name = "email-node" >
         < email  xmlns = "uri:oozie:email-action:0.1" >
             < to >[email protected]</ to >
             < cc >[email protected]</ cc >
             < subject >Email notifications for ${wf:id()}</ subject >
             < body >The wf ${wf:id()} successfully completed.</ body >
         </ email >
         < ok  to = "end" />
         < error  to = "fail" />
     </ action >
     < kill  name = "fail" >
         < message >Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</ message >
     </ kill >
    < end  name = 'end'  />
</ workflow-app >
3,首先在本地的测试节点上创建文件夹
mkdir -p /opt/mydata/user/oozie/xwj_test/apps/shell/ sqoop_email
4,在hdfs上创建目录 hdfs dfs -mkdir -p /user/oozie/xwj_test/apps/shell/ sqoop_email
5,将上述文件上传到新建好的目录中
cd /opt/mydata/user/oozie/xwj_test/apps/shell/ sqoop_email
6,将本地文件 上传到hdfs目录中
hdfs dfs -put ../ sqoop_email/* /user/oozie/xwj_test/apps/shell/ sqoop_email
7,查看hdfs上的目录文件是否存在
hdfs dfs -ls -r /user/oozie/xwj_test/apps/shell/ sqoop_email
8,切换yarn用户重新提交任务
su yarn
oozie job -oozie http://hadoop-node0.novalocal:11000/oozie -config /opt/mydata/user/oozie/xwj_test/apps/shell/ sqoop_email/job.properties -run
执行结果报错
ACTION[0000002-180412152846094-oozie-root-W@sqoop-node] Launcher exception: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2241)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:238)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2147)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2239)
... 9 more
参考链接
修改job.properties 加入
oozie.use.system.libpath=true
重新运行
结果报错
2018-04-13 09:19:00,121 WARN SqoopActionExecutor:523 - SERVER[hadoop-node0.novalocal] USER[root] GROUP[-] TOKEN[] APP[email-wf] JOB[0000010-180412152846094-oozie-root-W] ACTION[0000010-180412152846094-oozie-root-W@sqoop-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
9 ,遇到这个问题直接百度有很多人遇到相关问题,但是解决的办法很少,这里我们记录一下排查过程
9.1 首先根据 oozie启动的任务ID 到oozie界面上找到 该任务的错误详情
9.2 点击错误的节点 查看节点执行的详情日志
9.3 最终层层定位 ,终于找到错误的真正日志
java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'id,app_name,app_path,user_name from WF_JOBS where (1 = 0)' at line 1 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:536) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:115) at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1983) at com.mysql.cj.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1826) at com.mysql.cj.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1923) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:777) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:253) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:337) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.main(Sqoop.java:243) at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197) at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58) at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:240) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
9.4 通过日志信息发现是 sqoop 导入mysql的 sql 格式有问题 没有添加 select ,修正之后 重新提交 终于运行成功 ( 这个地方虽然是一个比较粗心的错误 ,但是通过这个错误找到排查具体日志的方法,非常重要,对于研发来说里程牌式的意义

猜你喜欢

转载自www.cnblogs.com/wind-xwj/p/8946786.html