Azkaban parameters explained

1 Summary of parameter types

The parameters in azkaban's workflow can be divided into the following types:

  • Azkaban UI page input parameters
  • Environment variable parameters
  • The parameters defined in the job file
  • User-defined property files for workflow, parameters passed by upstream jobs to downstream
  • System parameters generated when the workflow runs
  • common parameter of job

The parameter types and their corresponding parameter ranges are as follows:

Parameter Type Scope
UI page input parameters, that is, workflow parameters flow is globally effective
Properties file in the ZIP archive of the workflow (end of .properties) Flow is valid globally, and the zip file directory and subdirectories are valid
Workflow runtime parameters flow is globally effective
Environment variable parameters flow is globally effective
common parameter of job Partially effective within the job
Parameters defined in the JOB file Partially effective within the job
Parameters passed by upstream operations to downstream Partially effective within the job

2. Introduction to job parameters commom parameters

In addition to the three parameters of type , command , dependencies , there are some reserved parameters that can be configured for each job

parameter Explanation
retries The number of automatic retries for failed jobs
retry.backoff Retry interval (ms)
working.dir Specifies the directory where the command is invoked. The default working directory is the executions / $ {execution_ID} directory
env.property Specify the environment variables to be set before the command is executed. Property defines the name of the environment variable, so env.VAR_NAME = VALUE creates a $ VAR_NAME environment variable and specifies its VALUE
failure.emails Emails sent when the job fails, separated by commas
success.emails Emails sent when the job is successful, separated by commas
notify.emails Mailboxes for job success or failure, separated by commas

The email attribute of a flow will only take the configuration of the last job, and the email configuration of other jobs will be ignored.

3. Parameter transfer between jobs

First look at the description on the official website:

Parameter Passing
There is often a desire to pass these parameters to the executing job code. The method of passing these parameters is dependent on the jobtype that is run, but usually Azkaban writes these parameters to a temporary file that is readable by the job.
The path of the file is set in JOB_PROP_FILE environment variable. The format is the same key value pair property files. Certain built-in job types do this automatically for you. The java type, for instance, will invoke your Runnable and given a proper constructor, Azkaban can pass parameters to your code automatically.
Parameter Output
Properties can be exported to be passed to its dependencies. A second environment variable JOB_OUTPUT_PROP_FILE is set by Azkaban. If a job writes a file to that path, Azkaban will read this file and then pass the output to the next jobs in the flow.
The output file should be in json format. Certain built-in job types can handle this automatically, such as the java type.

Means: JOB_OUTPUT_PROP_FILEand JOB_PROP_FILEare both an environment variable, pointing to the file path.

Parameters passed in:

The upstream node output values need to write json format JOB_OUTPUT_PROP_FILEfile, Azkaban during job execution to the upstream incoming job temporary parameter passing, parameters, parameter configuration file in the project, the parameters are all stored job definition at runtime ${JOB_PROP_FILE}files In, the save format is key = value. When executing the shell command in the job, it can be passed as a parameter.

Parameter out:

After the execution of an azkaban job is completed, some parameters can be written to the ${JOB_OUTPUT_PROP_FILE}file, and azkaban will pass these parameters to the parameter file ${JOB_PROP_FILE}file of the downstream dependent job for reference by the downstream job. The ${JOB_OUTPUT_PROP_FILE}parameters written to the file need to be in json format, otherwise json parsing error will be reported. Downstream nodes can JOB_PROP_FILEsee the output in the form of key-value, and use variables in the form of $ {key}.

Examples:

baseflow.flow

#baseflow.flow
nodes:
  - name: jobB
    type: command 
    dependsOn:
       - jobA
    config:
       command: sh commandB.sh "${firstName}" 

  - name: jobA
    type: command
    config:
       command: sh commandA.sh

commandA.sh

#!/bin/bash
echo '{ "firstName":"John" , "lastName":"Doe" }' >> ${JOB_OUTPUT_PROP_FILE}

commandB.sh

#!/bin/bash
cat ${JOB_PROP_FILE} >> /root/azkaban.txt
echo $1 >> /root/azkaban.txt

jobB depends on JobA. After jobA is executed, it will send a string of json content to ${JOB_OUTPUT_PROP_FILEthe file pointed to by}. After jobA is executed, jobB can be executed. When job is executed, the content of jobA will be written to / root / azkaban. txt, and append the firstName in the parameter to the file, note that the first parameter can only be passed by shell invocation .

4 The runtime attribute of job parameters

The runtime attribute is automatically added during the job

parameter Explanation
azkaban.job.attempt The number of job retries, increasing from 0
azkaban.job.id Job name
azkaban.flow.flowid The flow name of the running job
azkaban.flow.execid flow execution id
azkaban.flow.projectid Project id
azkaban.flow.projectversion Project uploaded version
azkaban.flow.uuid flow uuid
azkaban.flow.start.timestamp flow start timestamp
azkaban.flow.start.year flow start year
azkaban.flow.start.month flow start month
azkaban.flow.start.day flow start days
azkaban.flow.start.hour flow start hours
azkaban.flow.start.minute start minutes
azkaban.flow.start.second start seconds
azkaban.flow.start.millseconds start milliseconds
azkaban.flow.start.timezone start time zone

5 Parameter inheritance of job parameters

后缀为.properties的文件将会作为参数文件加载,并且为flow中每个job所共享属性文件通过目录分层结构继承

比如,在zip包中有以下结构

system.properties 
baz.job 
myflow/myflow.properties 
myflow/myflow2.properties 
myflow/foo.job 
myflow/bar.job

system.properties是全局的属性,将会被baz.job和myflow目录下的foo.job和bar.job使用,但是baz.job不会继承myflow.properties和myflow2.properties的属性,因为是它的下层.

6 job参数之参数替换

azkaban支持参数替换;替换参数样式: azkaban会替换{}中的参数。无论${parameterName} 在job file中或者在参数文件中或者运行时参数发现,都可以被替换为对应的值。

shared.properties 

# shared.properties 
replaceparameter=bar
myjob.job
# myjob.job 
param1=mytest 
foo=${replaceparameter} #${replaceparameter}会替换为bar 
param2=${param1} # ${param1} 会被替换成mytest。

前面这个例子,在myjob 作业运行前,foo 会被赋值为bar , param2会被赋值为mytest.
注意:参数名不能有空格,标点符号等。

 7 shell动态传参

azkaban中的shell 作业,如何接收从webUI传递的参数?

7.1 UI页面输入参数定义

ui_test=test111111111

7.2 在job文件myjob.job指定

##作业定义文件UI输入参数接收:
job_param4=${ui_test}

##作业定义文件脚本命令行引用UI输入参数:
sh test_azkaban_job.sh "${job_param4}"

7.3 shell test_azkaban_job.sh 的内容

vim  test_azkaban_job.sh

echo "inputparamter:$1"  #接收job文件中传递的参数。

FAQ1:在页面手动执行前面的job时,如果UI参数ui_test在job执行没有输入,会执行失败。异常信息如下:

hello ERROR - Failed to build job executor for job hello Could not find variable substitution for variable(s) [param4->ui_test ]

在定时调度任务指定时,需要指定工作流参数flowParameters :ui_test,避免该错误。

7.4 shell中使用参数的注意事项

在UI页面重新输入运行时参数时,可以覆盖系统默认生成的参数值。运行时参数,和UI输入的参数,都可以认为是全局参数,在整个工作流的作业配置中,都可以通过 ${参数名} 的方式引用使用。

  • 在shell 中直接引用 公共参数,运行时系统参数,UI输入参数,是无效的。
  • 在shell中只能直接使用环境变量;
  • 公共参数,运行时系统参数,UI输入参数能只通过shell的脚本参数的方式传递进来。
  • job文件中定义的环境变量参数,可以在shell脚本中直接引用,但只对当前job有效。

 

参考文章:http://www.manongjc.com/detail/12-afcbaaqegipvvnm.html

Guess you like

Origin www.cnblogs.com/hyunbar/p/12759149.html
Recommended