Introduction to azkaban and introduction to azkaban deployment, principle and use
Introduction to azkaban
Azkaban is a simple task scheduling service, which consists of three parts: webserver, dbserver, and executorserver.
Azkaban is a Java project open sourced by Linkedin, a batch workflow task scheduler. Used to run a set of tasks and processes in a specific order within a workflow.
Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use web user interface to maintain and track your workflow.
Project official website: https://azkaban.github.io/
Features of Azkaban
1. Web user interface
2. Convenient to upload workflow
3. Convenient to set the relationship between tasks
4. Workflow scheduling
5. Authentication/authorization
6. Ability to kill and restart workflow
7. Modularization and pluggable plug-in mechanism
8. Project workspace
9. Logging and auditing of workflows and tasks
azkaban installation and deployment
Preparation:
Installation and deployment require 3 components:
azkaban-executor-server-2.5.0.tar.gz
azkaban-sql-script-2.5.0.tar.gz
azkaban-web-server-2.5.0.tar.gz
Network disk sharing connection address: https://pan.baidu.com/s/1mMuIuVv9Ji6yO2A2b8Ibrg
extraction code: seld
[Note:] Deploy the mysql service in advance. Installation of mysql is not introduced here.
Install components:
# 上传安装包
wangting@ops01:/opt/software/azkaban >ll
total 22612
-rw-r--r-- 1 root root 11157302 May 16 10:45 azkaban-executor-server-2.5.0.tar.gz
-rw-r--r-- 1 root root 1928 May 16 10:45 azkaban-sql-script-2.5.0.tar.gz
-rw-r--r-- 1 root root 11989669 May 16 10:45 azkaban-web-server-2.5.0.tar.gz
# 创建应用目录,利于解压多组件都在一个管理目录中
wangting@ops01:/opt/software/azkaban >mkdir /opt/module/azkaban
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >ls /opt/module/azkaban/
azkaban-2.5.0 azkaban-executor-2.5.0 azkaban-web-2.5.0
wangting@ops01:/opt/software/azkaban >
wangting@ops01:/opt/software/azkaban >cd /opt/module/azkaban/
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 azkaban-executor-2.5.0
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 azkaban-web-2.5.0
# 改名,易于管理和切换目录
wangting@ops01:/opt/module/azkaban >mv azkaban-executor-2.5.0 executor
wangting@ops01:/opt/module/azkaban >mv azkaban-web-2.5.0 server
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 executor
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 server
wangting@ops01:/opt/module/azkaban >
# azkaban-2.5.0目录下sql文件用于后面azkaban数据库项目初始化
wangting@ops01:/opt/module/azkaban >ls azkaban-2.5.0/
create.active_executing_flows.sql create.execution_flows.sql create.project_events.sql create.project_permissions.sql create.project_versions.sql create.triggers.sql update-all-sql-2.2.sql
create.active_sla.sql create.execution_jobs.sql create.project_files.sql create.project_properties.sql create.properties.sql database.properties update.execution_logs.2.1.sql
create-all-sql-2.5.0.sql create.execution_logs.sql create.project_flows.sql create.projects.sql create.schedules.sql update-all-sql-2.1.sql update.project_properties.2.1.sql
# 查看本机IP 和mysql服务是否正常运行着
wangting@ops01:/opt/module/azkaban >ifconfig eth0 |grep "inet "
inet 11.8.37.50 netmask 255.255.255.0 broadcast 11.8.37.255
wangting@ops01:/opt/module/azkaban >netstat -tnlpu|grep 3306
tcp6 0 0 :::3306 :::* LISTEN -
# 登录mysql
wangting@ops01:/opt/module/azkaban >mysql -uroot -pwangting
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 37069
Server version: 5.7.26 MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# 创建azkaban库
mysql> create database azkaban;
Query OK, 1 row affected (0.00 sec)
mysql> use azkaban;
Database changed
mysql> show tables;
Empty set (0.00 sec)
# 初始化
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
mysql> show tables;
+------------------------+
| Tables_in_azkaban |
+------------------------+
| active_executing_flows |
| active_sla |
| execution_flows |
| execution_jobs |
| execution_logs |
| project_events |
| project_files |
| project_flows |
| project_permissions |
| project_properties |
| project_versions |
| projects |
| properties |
| schedules |
| triggers |
+------------------------+
15 rows in set (0.00 sec)
# 完成退出
mysql> exit
Bye
wangting@ops01:/opt/module/azkaban >
wangting@ops01:/opt/module/azkaban >cd server
wangting@ops01:/opt/module/azkaban/server >pwd
/opt/module/azkaban/server
# 生成认证 keystore jetty 都是配置文件中对应的名称
wangting@ops01:/opt/module/azkaban/server >keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password: # wangting 密码可以自定义
Re-enter new password: # wangting 重复密码
What is your first and last name? # 回车
[Unknown]:
What is the name of your organizational unit? # 回车
[Unknown]:
What is the name of your organization? # 回车
[Unknown]:
What is the name of your City or Locality? # 回车
[Unknown]:
What is the name of your State or Province? # 回车
[Unknown]:
What is the two-letter country code for this unit? # 回车
[Unknown]:
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct? # y
[no]: y
Enter key password for <wangting>
(RETURN if same as keystore password): # 回车
Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore keystore -destkeystore keystore -deststoretype pkcs12".
wangting@ops01:/opt/module/azkaban/server >
# 查看一下时区
wangting@ops01:/opt/module/azkaban/server >cat /etc/localtime
TZifǚ^ ??ˊ??л>???-???????fp???|?? i ~?!I}"g? #)_$G %|&'e &??G (р~pCDTCSTTZif2
6C)????ǚ^???? ?????????ˊ????@????л>????{?????-????"????????????????fp??????????|?? i ~?!I}"g? #)_$G %|&'e &??G (рq?LMTCDTCST
CST-8
# 最后需要时CST-8,如果不是CST-8 东八区时区需要调置
wangting@ops01:/opt/module/azkaban/server >cd conf/
# 更改server配置
wangting@ops01:/opt/module/azkaban/server/conf >ls
azkaban.properties azkaban-users.xml
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban.properties
default.timezone.id=Asia/Shanghai # 改成Asia/Shanghai
database.type=mysql
mysql.port=3306
mysql.host=11.8.37.50 # IP改成mysql部署的ip
mysql.database=azkaban # 刚才创建的azkaban库
mysql.user=root
mysql.password=wangting
mysql.numconnections=100
# Azkaban Jetty server properties.
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
jetty.keystore=keystore # keytool执行时对应的keystore
jetty.password=wangting # 密码都改成刚才设置的密码
jetty.keypassword=wangting
jetty.truststore=keystore
jetty.trustpassword=wangting
# 添加用户,相当于注册功能
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban-users.xml
<azkaban-users>
<user username="azkaban" password="azkaban" roles="admin" groups="azkaban" />
<user username="metrics" password="metrics" roles="metrics"/>
<user username="wangting" password="wangting" roles="admin, metrics"/> # 可自定义用户名密码,用于界面登录使用
<role name="admin" permissions="ADMIN" />
<role name="metrics" permissions="METRICS"/>
</azkaban-users>
# 更改executor配置
wangting@ops01:/opt/module/azkaban/server/conf >cd /opt/module/azkaban/executor/conf/
wangting@ops01:/opt/module/azkaban/executor/conf >ls
azkaban.private.properties azkaban.properties global.properties
wangting@ops01:/opt/module/azkaban/executor/conf >vim azkaban.properties
#Azkaban
default.timezone.id=Asia/Shanghai # 改成Asia/Shanghai
# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes
#Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
database.type=mysql # 数据库更改
mysql.port=3306
mysql.host=11.8.37.50
mysql.database=azkaban
mysql.user=root
mysql.password=wangting
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30
Start the service:
wangting@ops01:/opt/module/azkaban/executor/conf >cd /opt/module/azkaban/server/
wangting@ops01:/opt/module/azkaban/server >bin/azkaban-web-start.sh
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
2021/05/16 11:26:42.425 +0800 INFO [log] [Azkaban] Started [email protected]:8443
2021/05/16 11:26:42.425 +0800 INFO [AzkabanWebServer] [Azkaban] Server running on ssl port 8443.
wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh
wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
Starting AzkabanExecutorServer on port 12321 ...
2021/05/16 11:29:20.076 +0800 INFO [log] [Azkaban] Started [email protected]:12321
2021/05/16 11:29:20.076 +0800 INFO [AzkabanExecutorServer] [Azkaban] Azkaban Executor Server started on port 12321
Page visits
After successful login, the deployment process is completed.
Introduction to using azkaban
projects: The most important part, create a project, and all flows will run in the project.
scheduling: display scheduled tasks
Executing: Displays the currently running tasks
history: Display historical running tasks
single independent task
1. Create project
Create a project
project_1
Description information
2. Define a job
How the task is executed and what the task does specifically is defined in the job file
# 本地新建一个command.job文件,文件中的内容末尾不要有空格,内容如下:
# command.job
type=command
command=mkdir /opt/module/ztdata_0516
3. Package the job definition file into a zip package
After editing the command.job file, use compression software to package it into a zip file, such as command.zip
4. Upload the task compressed package to the project
After uploading, if you want to see the content of the job, you can view and parse the task content in the job command.
5. View tasks and execute tasks
Click the command task in Flows to enter the specific interface of the task. Execute Flow can execute the task.
[Note:] Because the interface operation is used, the relevant files can be edited, created, and zipped directly on the local Windows computer.
6. Historical task records
After the task is executed, you can view the task history in history;
7. Verify execution results
wangting@ops01:/opt/module >ll
total 52
drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive
drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka
drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix
drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410
drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >
After the task is completed, verify: In the /opt/module/ directory, a new directory ztdata_0516 was successfully created, indicating that the task was successfully suspended and executed.
Multiple task workflow
[Note:] Subsequent experiments will no longer take screenshots one by one, and the process is the same as Example 1.
Create project
Create a project
project_2
Description information
Define job tasks
Create 2 job files locally
one.job
# one.job
type=command
command=mkdir /opt/module/one
two.job
# two.job
type=command
dependencies=one
command=touch /opt/module/one/two.txt
[Note:] dependencies=one means that the job task two depends on the task one. Defining this parameter means that they are executed sequentially, and two needs to be executed after one is completed.
Package the job definition file into a zip package
The name of the zip file is arbitrary
Upload task compressed package to project
On the homepage, click the Projects paging bar above, open the project_2 project, Upload in the upper right corner; then upload the zip file
perform tasks
Click Flows, click two of the main task, and execute after entering
Validation results
wangting@ops01:/opt/module >ll
total 56
drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive
drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x 2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix
drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410
drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >cd one/
wangting@ops01:/opt/module/one >ls
two.txt
wangting@ops01:/opt/module/one >
After the task is completed, verify: In the /opt/module/ directory, a new directory one was successfully created, indicating that task 1 was successfully suspended and executed.
Enter the one directory and successfully view the two.txt file, indicating that task 2 was successfully suspended and executed.
Call a task script to execute the task
Writing a simulation process on the server is complicated, such as calling a script to execute hive, hdfs, etc. Business script tasks:
/opt/module/test >vim test_azkaban.sh
wangting@ops01:/opt/module/test >vim test_azkaban.sh
#!/bin/bash
echo "123"
echo "123123"
echo "123123123"
ls -l /opt/module/ >> /opt/module/test/shell_log_0516.log
hdfs dfs -ls / >> /opt/module/test/shell_log_0516.log
NOW=`date|awk -F" " '{print $4}'`
echo "当前时间: $NOW"
wangting@ops01:/opt/module/test >chmod +x test_azkaban.sh
Create project
Create a project
project_3
Description information
Define job tasks
# run_bash.job
type=command
command=bash /opt/module/test/test_azkaban.sh
Package the job definition file into a zip package
Same as the case above
Upload task compressed package to project
On the home page, click the Projects paging bar above, open the project_3 project, Upload in the upper right corner; then upload the zip file
perform tasks
Click Flows, click run_bash of the main task, and execute after entering
Validation results
wangting@ops01:/opt/module/test >ll
total 8
-rw-rw-r-- 1 wangting wangting 1801 May 16 12:49 shell_log_0516.log
-rwxrwxr-x 1 wangting wangting 226 May 16 12:44 test_azkaban.sh
wangting@ops01:/opt/module/test >
# 查看输出内容是否有遍历目录和查看hdfs根目录的内容
wangting@ops01:/opt/module/test >cat shell_log_0516.log
total 60
drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive
drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x 2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix
drwxrwxr-x 2 wangting wangting 4096 May 16 12:49 test
drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410
drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516
2021-05-16 12:49:16,801 INFO [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
Found 10 items
drwxr-xr-x - wangting supergroup 0 2021-03-17 11:44 /20210317
drwxr-xr-x - wangting supergroup 0 2021-03-19 10:51 /20210319
drwxr-xr-x - wangting supergroup 0 2021-04-24 17:05 /flume
-rw-r--r-- 3 wangting supergroup 338075860 2021-03-12 11:50 /hadoop-3.1.3.tar.gz
drwxr-xr-x - wangting supergroup 0 2021-05-13 15:31 /hbase
drwxr-xr-x - wangting supergroup 0 2021-04-04 11:07 /test.db
drwxr-xr-x - wangting supergroup 0 2021-03-19 11:14 /testgetmerge
drwxr-xr-x - wangting supergroup 0 2021-04-10 16:23 /tez
drwx------ - wangting supergroup 0 2021-04-02 15:14 /tmp
drwxr-xr-x - wangting supergroup 0 2021-04-02 15:25 /user
wangting@ops01:/opt/module/test >
After the task is completed, verify: In the /opt/module/test directory, the shell_log_0516.log file was successfully created, indicating that the task run_bash was successfully suspended and executed.