oozie scheduling case

In the previous blog, I introduced the installation-related content of oozie. This article will tell you about a simple use case of oozie.

Before talking about the use cases, we must first emphasize a few points
1. Oozie's workflow must be a directed acyclic graph, in fact, Oozie is equivalent to a Hadoop client, when the user needs to perform multiple associated MR tasks At this time, you only need to write the MR execution sequence into workflow.xml, and then use Oozie to submit this task, and Oozie will host the task flow

2. Before using Oozie, you must first start hdfs, yarn and jobhistory (CDH version of hadoop is recommended for compatibility issues) and then start oozie by executing bin/oozied.sh start

3. Oozie is essentially a job coordination tool (the underlying principle is to do it by converting the xml language into a mapreduce program , but it is only done on the centralized map side to avoid the process of shuffle.)

Case: Scheduling shell script Case
1) Decompress the official case template
[root@hadoop102 oozie-4.0.0-cdh5.3.6]# tar -zxvf oozie-examples.tar.gz

2) Create the oozie-apps directory
[root@hadoop102 oozie-4.0.0-cdh5.3.6]# mkdir oozie-apps

3) Copy the task template to oozie-apps
[root@hadoop102 oozie-4.0.0-cdh5.3.6]# cp -r examples/apps/shell/ oozie-apps

4) Modify job.properties in oozie-apps

Insert picture description here

Replace the above content with the following, note that jobTracker points to the machine where yarn is located

#HDFS地址
nameNode=hdfs://hadoop102:8020
#ResourceManager地址
jobTracker=hadoop103:8032
#队列名称
queueName=default
#作业的根目录
examplesRoot=oozie-apps
#指定oozie的shell脚本在HDFS中的路径
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
#执行的Shell脚本
EXEC=p1.sh

5) Create a new file p1.sh in the /opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps/shell directory, then add the following content, save and exit

#!/bin/bash
date > /opt/module/p1.log

6) Check the workflow.xml file, you can see that the default is to print a sentence, we change this to execute our script.

Insert picture description here

Delete the content of the red box above and replace it with the content below

 <exec>${EXEC}</exec>
        <!-- <argument>my_output=Hello Oozie</argument> -->
        <file>/user/root/oozie-apps/shell/${EXEC}#${EXEC}</file>

        <capture-output/>

Then change the value of to in the first box to end, delete all the contents of the second box and save and exit (our task is mainly to exit if the script is executed just now, and if it fails, go to the fail node )

Insert picture description here

7) Upload task configuration

[root@hadoop102 oozie-4.0.0-cdh5.3.6]# /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/root

Go to the web side and take a look, you can see that the upload has been successful

Insert picture description here

8) Execute the task, a JobId will appear after the task is executed

[root@hadoop102 oozie-4.0.0-cdh5.3.6]# bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/shell/job.properties -run

Insert picture description here

Click on the web terminal in our oozie, you can see that it has been displayed successfully, and the installation is also our expectation to go to start -->shell-->end

Regarding the incomplete display of the web page, it is not because there is a problem with the task execution, it is likely to be a browser problem, and it is recommended to use Google Chrome to access

Insert picture description here

Insert picture description here

We went to /opt/module on hadoop102,103,104 to check and found that the generated p1.log was on hadoop104, that is, yarn gave this task to hadoop104 for execution

Insert picture description here

Let's cat the p1.log on hadoop104, which is the result of the above script.

Insert picture description here

This step can actually be seen in the history server and the task is handed over to hadoop104 for execution.

Insert picture description here

Tips: You can also look at the execution of the task in the Job DAG (directed acyclic graph) of the task. This is in line with the oozie workflow (workflow) mentioned earlier. If the task is successfully executed, the process will turn green. It will turn red if it fails.

Insert picture description here

Commonly used commands about oozie: #Submit
task. -config is to specify the location of the job.properties file of the oozie task, and submit is to submit the task. After each task is submitted, the task will be placed on the server and a jobId will be produced, but the task will not be run
bin/oozie job -oozie http:/ /hadoop102:11000/oozie/ -config /opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps/shell/job.properties -submit

#Execute the task, 0000000-200801125303709-oozie-root-W This is the jobId, the id of each task is unique, this is the
bin/oozie job -oozie http://hadoop102:11000/oozie generated after the task is submitted / -start 0000000-200801125303709-oozie-root-W


#Run the task, run=submit+execute bin/oozie job -oozie http://hadoop102:11000/oozie/ -config /opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps/shell/job .properties -run


#Kill a task bin/oozie job -oozie http://hadoop102:11000/oozie -kill 0000000-200801125303709-oozie-root-W


#View task information, you can view the status of each execution item bin/oozie job -oozie http://hadoop102:11000/oozie -info 0000000-200801125303709-oozie-root-W


#View the task log, you can view the output content and log content of each task bin/oozie job -oozie http://hadoop102:11000/oozie -log 0000000-200801125303709-oozie-root-W


#Verify whether the workflow.xml file has grammatical problems bin/oozie validate -oozie http://hadoop102:11000/oozie workflow.xml

Reference materials:
https://www.cnblogs.com/shenjie2017/articles/9774681.html

https://blog.csdn.net/TNTZS666/article/details/81915820

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/107730024