Getting started with Azkaban, deployment installation and case

One: Introduction to Azkaban

Azkaban is a batch workflow task scheduler open sourced by Linkedin . Used to run a set of jobs and processes in a specific order within a workflow. Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use web user interface to maintain and track your workflow.
It has the following features:
  1. Web user interface
  2. Easy to upload workflow
  3. Easy to set the relationship between tasks
  4. Scheduling workflow
  5. Authentication/Authorization (authorization work)
  6. Ability to kill and restart work Flow
  7, modular and pluggable plug-in mechanism
  8, project workspace
  9, workflow and task logging and auditing

Two: the deployment and installation of Azkaban

2.1: Install azkaban

1. Create an azkaban directory under /opt/module on hadoop102
[root@hadoop102 module]# mkdir azkaban

2. Decompress the tar package containing azkaban to the specified directory. Here, all my tar packages about azkaban are placed in the /opt/software/Azkaban_tars directory. Then execute the following in sequence.

[root@hadoop102 Azkaban_tars]# tar -zxvf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/

[root@hadoop102 Azkaban_tars]# tar -zxvf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/

[root @ hadoop102 Azkaban_tars] # tar -zxvf azkaban-web-server-2.5.0.tar.gz -C / opt / module / azkaban /

[root@hadoop102 azkaban]# ll
total 4
drwxr-xr-x. 2 root root 4096 Aug  6 09:37 azkaban-2.5.0
drwxr-xr-x. 7 root root   92 Aug  6 09:37 azkaban-executor-2.5.0
drwxr-xr-x. 8 root root  103 Aug  6 09:38 azkaban-web-2.5.0

3. Enter the decompressed /opt/module/azkaban directory and rename the file

[root@hadoop102 azkaban]# mv azkaban-web-2.5.0/ server
[root@hadoop102 azkaban]# mv azkaban-executor-2.5.0/ executor

4. Enter mysql mysql -uroot -p123456
and perform the following operations The purpose is to execute the built-in script of azkaban and then create a mysql table

mysql> create database azkaban;
mysql> use azkaban;
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql

Finally, the following table in show tables under azkaban indicates that the import was successful, and you can quit to exit mysql at this time.

mysql> show tables;
±-----------------------+
| Tables_in_azkaban |
±-----------------------+
| active_executing_flows |
| active_sla |
| execution_flows |
| execution_jobs |
| execution_logs |
| project_events |
| project_files |
| project_flows |
| project_permissions |
| project_properties |
| project_versions |
| projects |
| properties |
| schedules |
| triggers |
±-----------------------+
15 rows in set (0.00 sec)

2.2: Generate key pair and certificate

1. Key pair and certificate related content:
Keytool is a Java data certificate management tool that enables users to manage their own public/private key pair and related certificates.
-keystore Specify the name and location of the keystore (all kinds of information generated will be stored in the .keystore file)
-genkey (or -genkeypair) Generate a key pair
-alias Specify an alias for the generated key pair, if not, the default is mykey
-keyalg specifies the key algorithm RSA/DSA The default is DSA

2. Execute keytool -keystore keystore -alias jetty -genkey -keyalg RSA in the /opt/module/azkaban/server directory of hadoop102

See the content of the picture for details

Insert picture description here

This is under ls, you can see that there is an additional keystore file in the current directory, which is just generated.

Insert picture description here

2.3: Time synchronization configuration

Reasons for time synchronization configuration: When azkaban performs time scheduling, the time requirements are very accurate, and it is necessary to ensure that the cluster time is synchronized at will.

1. Configure the time zone on the server node first. If the time zone configuration file Asia/Shanghai does not exist in the /usr/share/zoneinfo/ directory, use tzselect to generate it. If not, please move to https://www. cnblogs.com/liuxinrong/articles/12739198.html

2. After configuring the first step, copy the time zone file to overwrite the local time zone configuration of the system, and execute the following commands in the hadoop cluster.

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

2.4: Web server server and execution server configuration

1. Enter the conf directory of the azkaban web server installation directory and edit the azkaban.properties file

After entering, you can first set nu to set the line number.
1) The contents of lines 6 and 7 are respectively modified to

#默认web server存放web文件的目录
web.resource.dir=/opt/module/azkaban/server/web/
#默认时区,已改为亚洲/上海 默认为美国
default.timezone.id=Asia/Shanghai

2) Change the content of line 11 to

#用户权限管理默认类(绝对路径)
user.manager.xml.file=/opt/module/azkaban/server/conf/azkaban-users.xml

3). 17 to 22 lines are about mysql. Note that the username and password of mysql and the database connection IP need to be configured according to your own actual situation.

#数据库连接IP
mysql.host=hadoop102
#数据库实例名
mysql.database=azkaban
#数据库用户名
mysql.user=root
#数据库密码
mysql.password=123456
#最大连接数
mysql.numconnections=100

4) 32 to 36 lines (you can save and exit after configuration)

#SSL文件名(绝对路径)
jetty.keystore=/opt/module/azkaban/server/keystore
#SSL文件密码
jetty.password=123456
#Jetty主密码与keystore文件相同
jetty.keypassword=123456
#SSL文件名(绝对路径)
jetty.truststore=/opt/module/azkaban/server/keystore
#SSL文件密码
jetty.trustpassword=123456

3. Enter the /opt/module/azkaban/executor/conf directory to edit the azkaban.properties file
1) Change the time zone on line 2 to

default.timezone.id=Asia/Shanghai

2) Modify the 8th line to

executor.global.properties=/opt/module/azkaban/executor/conf/global.properties

3) Lines 13 to 16 modify the content of mysql (you can save and exit after configuration)

mysql.host=hadoop102
mysql.database=azkaban
mysql.user=root
mysql.password=123456

2. Web server user configuration
In the conf directory of the azkaban web server installation directory (/opt/module/azkaban/server/conf), modify the azkaban-users.xml file according to the following configuration to add administrator users.

To <user username="admin" password="admin" roles="admin,metrics"/>save and exit added to the inside.

Insert picture description here

2.5: Start the executor server and web server

1. Execute the start command in the executor server directory
[root@hadoop102 executor]# bin/azkaban-executor-start.sh

2. Execute the start command in the azkaban web server directory
[root@hadoop102 server]# bin/azkaban-web-start.sh

Note:
1) It should be emphasized here that the shutdown command is not the corresponding stop, but shutdown.
2) azkaban-web-shutdown.sh and azkaban-executor-shutdown.sh
3) Start the executor server first when turning it on, and turn it off When you need to shut down the web server first

Note that these two are both blocking processes. We can start a window and then jps to check the process. You can see that the executor and web have been started
[root@hadoop102 server]# jps
7504 Jps
7478 AzkabanWebServer
7432 AzkabanExecutorServer

3. Check the web azkaban end
input in Google browser, https://hadoop102:8443/enter into the page, the first words will say is not private, simply ignore it, and then under the following high-level, go to the next point on it.

Insert picture description here

Insert picture description here

Enter the account (admin) and password (123456) and enter the web

Insert picture description here

Insert picture description here

4. Page introduction on the home page:


    projects:会罗列出来所有的工作流和任务
    scheduling:罗列出来所有的定时调度任务
    executing:显示正在执行的任务
    history:显示已经执行完毕的任务

Three: Application case of Azkaban

3.1: A single job case

Before we talk about the case, we need to understand the built-in task type support command and java in Azkaba .

1. Create a new file on windows, we might as well call hello.job ( note that it must be the xxx.job file, otherwise it will not be recognized )
and enter the following content
type=command
command= echo'hello world'

2. Type the file into a zip package ( note that it must be a zip package, xxx.zip is enough, because azkaban does not support other versions of compressed packages )

3. Upload and run
1) Create a project through the azkaban web management platform and upload the zip package of the job. First, create a project. Take any project name, but note that the project name cannot be the same. The following description information can be optional .

Insert picture description here

Insert picture description here

2) After Create Project, you will jump to the uploadf page, click upload in the upper right corner to upload the zip compressed file, and then you can see that the file has been successfully uploaded.

Insert picture description here

Insert picture description here

3) Click Execute flow to execute the task flow. Since we are just a simple printing, we set it to execute immediately.

Insert picture description here

Insert picture description here

Click conuinue

Insert picture description here

4) View the execution of the task

Insert picture description here

Insert picture description here

Let's go to details to see the details

Insert picture description here

Insert picture description here

3.2: Scheduling shell script case

1. Create a new azkaban_job directory under /opt/module/azkaban of hadoop10, and then create a script file date.sh in this directory, fill in the content, save and exit

#!/bin/bash
date >>  /opt/module/azkaban/azkaban_job/date.txt

2. Create a test2.job in windows, fill in the content, because we have not given execution permission to the script, so use sh to execute the script in this file.

type=command
command=sh /opt/module/azkaban/azkaban_job/date.sh

3. Upload test2.job into a zip package and upload it to the web of azkaban. This time we choose to execute it regularly to see the effect, and it will be executed every minute.

Insert picture description here

Insert picture description here

Timed tasks can be viewed in Scheduling

Insert picture description here

In the history, you can see that the task has been successfully executed 3 times.

Insert picture description here

We went to the directory where the script was printed at the time and found that azkaban had successfully executed our script

[root@hadoop102 azkaban_job]# pwd
/opt/module/azkaban/azkaban_job
[root@hadoop102 azkaban_job]# ls
date.sh  date.txt
[root@hadoop102 azkaban_job]# cat date.txt
Thu Aug  6 18:16:45 CST 2020
Thu Aug  6 18:17:45 CST 2020
Thu Aug  6 18:18:44 CST 2020
Thu Aug  6 18:19:45 CST 2020
[root@hadoop102 azkaban_job]#

Finally, there is a question about closing the timing task. The Remove schdule in the Scheduling can delete the timing task. In this case, our timed tasks will no longer be executed.

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/107831663
Recommended