Introduction to azkaban and introduction to azkaban deployment, principle and use

Introduction to azkaban and introduction to azkaban deployment, principle and use

Introduction to azkaban

Azkaban is a simple task scheduling service, which consists of three parts: webserver, dbserver, and executorserver.
Azkaban is a Java project open sourced by Linkedin, a batch workflow task scheduler. Used to run a set of tasks and processes in a specific order within a workflow.
Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use web user interface to maintain and track your workflow.

Project official website: https://azkaban.github.io/

Features of Azkaban

1. Web user interface
2. Convenient to upload workflow
3. Convenient to set the relationship between tasks
4. Workflow scheduling
5. Authentication/authorization
6. Ability to kill and restart workflow
7. Modularization and pluggable plug-in mechanism
8. Project workspace
9. Logging and auditing of workflows and tasks

azkaban installation and deployment

Preparation:

Installation and deployment require 3 components:

azkaban-executor-server-2.5.0.tar.gz
azkaban-sql-script-2.5.0.tar.gz
azkaban-web-server-2.5.0.tar.gz

Network disk sharing connection address: https://pan.baidu.com/s/1mMuIuVv9Ji6yO2A2b8Ibrg
extraction code: seld
[Note:] Deploy the mysql service in advance. Installation of mysql is not introduced here.

Install components:

# 上传安装包
wangting@ops01:/opt/software/azkaban >ll
total 22612
-rw-r--r-- 1 root root 11157302 May 16 10:45 azkaban-executor-server-2.5.0.tar.gz
-rw-r--r-- 1 root root     1928 May 16 10:45 azkaban-sql-script-2.5.0.tar.gz
-rw-r--r-- 1 root root 11989669 May 16 10:45 azkaban-web-server-2.5.0.tar.gz
# 创建应用目录,利于解压多组件都在一个管理目录中
wangting@ops01:/opt/software/azkaban >mkdir /opt/module/azkaban
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >ls /opt/module/azkaban/
azkaban-2.5.0  azkaban-executor-2.5.0  azkaban-web-2.5.0
wangting@ops01:/opt/software/azkaban >
wangting@ops01:/opt/software/azkaban >cd /opt/module/azkaban/
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 azkaban-executor-2.5.0
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 azkaban-web-2.5.0
# 改名,易于管理和切换目录
wangting@ops01:/opt/module/azkaban >mv azkaban-executor-2.5.0 executor
wangting@ops01:/opt/module/azkaban >mv azkaban-web-2.5.0 server
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 executor
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 server
wangting@ops01:/opt/module/azkaban >
# azkaban-2.5.0目录下sql文件用于后面azkaban数据库项目初始化
wangting@ops01:/opt/module/azkaban >ls azkaban-2.5.0/
create.active_executing_flows.sql  create.execution_flows.sql  create.project_events.sql  create.project_permissions.sql  create.project_versions.sql  create.triggers.sql     update-all-sql-2.2.sql
create.active_sla.sql              create.execution_jobs.sql   create.project_files.sql   create.project_properties.sql   create.properties.sql        database.properties     update.execution_logs.2.1.sql
create-all-sql-2.5.0.sql           create.execution_logs.sql   create.project_flows.sql   create.projects.sql             create.schedules.sql         update-all-sql-2.1.sql  update.project_properties.2.1.sql
# 查看本机IP 和mysql服务是否正常运行着
wangting@ops01:/opt/module/azkaban >ifconfig eth0 |grep "inet "
        inet 11.8.37.50  netmask 255.255.255.0  broadcast 11.8.37.255
wangting@ops01:/opt/module/azkaban >netstat -tnlpu|grep 3306
tcp6       0      0 :::3306                 :::*                    LISTEN      -                   
# 登录mysql
wangting@ops01:/opt/module/azkaban >mysql -uroot -pwangting
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37069
Server version: 5.7.26 MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# 创建azkaban库
mysql> create database azkaban;
Query OK, 1 row affected (0.00 sec)

mysql> use azkaban;
Database changed
mysql> show tables;
Empty set (0.00 sec)
# 初始化
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
mysql> show tables;
+------------------------+
| Tables_in_azkaban      |
+------------------------+
| active_executing_flows |
| active_sla             |
| execution_flows        |
| execution_jobs         |
| execution_logs         |
| project_events         |
| project_files          |
| project_flows          |
| project_permissions    |
| project_properties     |
| project_versions       |
| projects               |
| properties             |
| schedules              |
| triggers               |
+------------------------+
15 rows in set (0.00 sec)
# 完成退出
mysql> exit
Bye
wangting@ops01:/opt/module/azkaban >
wangting@ops01:/opt/module/azkaban >cd server
wangting@ops01:/opt/module/azkaban/server >pwd
/opt/module/azkaban/server

# 生成认证  keystore jetty 都是配置文件中对应的名称
wangting@ops01:/opt/module/azkaban/server >keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password:  			# wangting   密码可以自定义
Re-enter new password: 				# wangting	 重复密码
What is your first and last name?		# 回车
  [Unknown]:  
What is the name of your organizational unit?	# 回车
  [Unknown]:  
What is the name of your organization?		# 回车
  [Unknown]:  
What is the name of your City or Locality?	# 回车
  [Unknown]:  
What is the name of your State or Province?	# 回车
  [Unknown]:  
What is the two-letter country code for this unit?	# 回车
  [Unknown]:  
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?		# y
  [no]:  y

Enter key password for <wangting>
	(RETURN if same as keystore password):  	# 回车 

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore keystore -destkeystore keystore -deststoretype pkcs12".
wangting@ops01:/opt/module/azkaban/server >
# 查看一下时区
wangting@ops01:/opt/module/azkaban/server >cat /etc/localtime 
TZifǚ^	??ˊ??л>???-???????fp???|?? i ~?!I}"g? #)_$G %|&'e &??G (р~pCDTCSTTZif2 
                                                                      6C)????ǚ^????	?????????ˊ????@????л>????{?????-????"????????????????fp??????????|?? i ~?!I}"g? #)_$G %|&'e &??G (рq?LMTCDTCST
CST-8

# 最后需要时CST-8,如果不是CST-8 东八区时区需要调置

wangting@ops01:/opt/module/azkaban/server >cd conf/
# 更改server配置
wangting@ops01:/opt/module/azkaban/server/conf >ls
azkaban.properties  azkaban-users.xml
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban.properties 
default.timezone.id=Asia/Shanghai			# 改成Asia/Shanghai

database.type=mysql
mysql.port=3306
mysql.host=11.8.37.50						# IP改成mysql部署的ip
mysql.database=azkaban						# 刚才创建的azkaban库
mysql.user=root
mysql.password=wangting
mysql.numconnections=100

# Azkaban Jetty server properties.
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
jetty.keystore=keystore					   # keytool执行时对应的keystore
jetty.password=wangting					   # 密码都改成刚才设置的密码
jetty.keypassword=wangting
jetty.truststore=keystore
jetty.trustpassword=wangting

# 添加用户,相当于注册功能
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban-users.xml 

<azkaban-users>
        <user username="azkaban" password="azkaban" roles="admin" groups="azkaban" />
        <user username="metrics" password="metrics" roles="metrics"/>
        <user username="wangting" password="wangting" roles="admin, metrics"/>			# 可自定义用户名密码,用于界面登录使用

        <role name="admin" permissions="ADMIN" />
        <role name="metrics" permissions="METRICS"/>
</azkaban-users>

# 更改executor配置
wangting@ops01:/opt/module/azkaban/server/conf >cd /opt/module/azkaban/executor/conf/
wangting@ops01:/opt/module/azkaban/executor/conf >ls
azkaban.private.properties  azkaban.properties  global.properties
wangting@ops01:/opt/module/azkaban/executor/conf >vim azkaban.properties 

#Azkaban
default.timezone.id=Asia/Shanghai				# 改成Asia/Shanghai

# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes

#Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects

database.type=mysql								# 数据库更改
mysql.port=3306
mysql.host=11.8.37.50
mysql.database=azkaban
mysql.user=root
mysql.password=wangting
mysql.numconnections=100

# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30

Start the service:

wangting@ops01:/opt/module/azkaban/executor/conf >cd /opt/module/azkaban/server/
wangting@ops01:/opt/module/azkaban/server >bin/azkaban-web-start.sh 
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
2021/05/16 11:26:42.425 +0800 INFO [log] [Azkaban] Started [email protected]:8443
2021/05/16 11:26:42.425 +0800 INFO [AzkabanWebServer] [Azkaban] Server running on ssl port 8443.

wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh 
wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh 
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
Starting AzkabanExecutorServer on port 12321 ...
2021/05/16 11:29:20.076 +0800 INFO [log] [Azkaban] Started [email protected]:12321
2021/05/16 11:29:20.076 +0800 INFO [AzkabanExecutorServer] [Azkaban] Azkaban Executor Server started on port 12321

Page visits

Insert image description here

Insert image description here

Insert image description here

​ After successful login, the deployment process is completed.

Introduction to using azkaban

​ projects: The most important part, create a project, and all flows will run in the project.

​ scheduling: display scheduled tasks

​Executing: Displays the currently running tasks

​history: Display historical running tasks

single independent task

1. Create project

​Create a project

​ project_1

​Description information
Insert image description here

2. Define a job

How the task is executed and what the task does specifically is defined in the job file

# 本地新建一个command.job文件,文件中的内容末尾不要有空格,内容如下:

# command.job
type=command
command=mkdir /opt/module/ztdata_0516


Insert image description here

3. Package the job definition file into a zip package

After editing the command.job file, use compression software to package it into a zip file, such as command.zip

4. Upload the task compressed package to the project

Insert image description here

After uploading, if you want to see the content of the job, you can view and parse the task content in the job command.
Insert image description here

5. View tasks and execute tasks

Click the command task in Flows to enter the specific interface of the task. Execute Flow can execute the task.
Insert image description here

[Note:] Because the interface operation is used, the relevant files can be edited, created, and zipped directly on the local Windows computer.

6. Historical task records

After the task is executed, you can view the task history in history;
Insert image description here

7. Verify execution results

wangting@ops01:/opt/module >ll
total 52
drwxrwxr-x  5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x  2 wangting wangting 4096 Apr  4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r--  1 wangting wangting   30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x  8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr  2 15:14 hive
drwxr-xr-x  7 wangting wangting 4096 Apr 29 11:07 kafka
drwxr-xr-x  5 wangting wangting 4096 Jun 27  2018 phoenix
drwxrwxr-x  3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x  5 wangting wangting 4096 Apr  2 15:03 tez-0.9.2_bak0410
drwxr-xr-x  8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x  2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >

After the task is completed, verify: In the /opt/module/ directory, a new directory ztdata_0516 was successfully created, indicating that the task was successfully suspended and executed.

Multiple task workflow

[Note:] Subsequent experiments will no longer take screenshots one by one, and the process is the same as Example 1.

Create project

Create a project

​ project_2

​Description information

Define job tasks

Create 2 job files locally

one.job

# one.job
type=command
command=mkdir /opt/module/one

two.job

# two.job
type=command
dependencies=one
command=touch /opt/module/one/two.txt

[Note:] dependencies=one means that the job task two depends on the task one. Defining this parameter means that they are executed sequentially, and two needs to be executed after one is completed.

Package the job definition file into a zip package

The name of the zip file is arbitrary
Insert image description here

Upload task compressed package to project

On the homepage, click the Projects paging bar above, open the project_2 project, Upload in the upper right corner; then upload the zip file

perform tasks

Click Flows, click two of the main task, and execute after entering

Validation results

wangting@ops01:/opt/module >ll
total 56
drwxrwxr-x  5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x  2 wangting wangting 4096 Apr  4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r--  1 wangting wangting   30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x  8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr  2 15:14 hive
drwxr-xr-x  7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x  2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x  5 wangting wangting 4096 Jun 27  2018 phoenix
drwxrwxr-x  3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x  5 wangting wangting 4096 Apr  2 15:03 tez-0.9.2_bak0410
drwxr-xr-x  8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x  2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >cd one/
wangting@ops01:/opt/module/one >ls
two.txt
wangting@ops01:/opt/module/one >

After the task is completed, verify: In the /opt/module/ directory, a new directory one was successfully created, indicating that task 1 was successfully suspended and executed.

Enter the one directory and successfully view the two.txt file, indicating that task 2 was successfully suspended and executed.

Call a task script to execute the task

Writing a simulation process on the server is complicated, such as calling a script to execute hive, hdfs, etc. Business script tasks:

/opt/module/test >vim test_azkaban.sh

wangting@ops01:/opt/module/test >vim test_azkaban.sh 

#!/bin/bash
echo "123"
echo "123123"
echo "123123123"
ls -l /opt/module/ >> /opt/module/test/shell_log_0516.log
hdfs dfs -ls / >> /opt/module/test/shell_log_0516.log
NOW=`date|awk -F" " '{print $4}'`
echo "当前时间: $NOW"

wangting@ops01:/opt/module/test >chmod +x test_azkaban.sh 

Create project

Create a project

​ project_3

​Description information

Define job tasks

# run_bash.job
type=command
command=bash /opt/module/test/test_azkaban.sh

Package the job definition file into a zip package

Same as the case above

Upload task compressed package to project

On the home page, click the Projects paging bar above, open the project_3 project, Upload in the upper right corner; then upload the zip file

perform tasks

Click Flows, click run_bash of the main task, and execute after entering

Validation results

wangting@ops01:/opt/module/test >ll
total 8
-rw-rw-r-- 1 wangting wangting 1801 May 16 12:49 shell_log_0516.log
-rwxrwxr-x 1 wangting wangting  226 May 16 12:44 test_azkaban.sh
wangting@ops01:/opt/module/test >
# 查看输出内容是否有遍历目录和查看hdfs根目录的内容
wangting@ops01:/opt/module/test >cat shell_log_0516.log 
total 60
drwxrwxr-x  5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x  2 wangting wangting 4096 Apr  4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r--  1 wangting wangting   30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x  8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr  2 15:14 hive
drwxr-xr-x  7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x  2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x  5 wangting wangting 4096 Jun 27  2018 phoenix
drwxrwxr-x  2 wangting wangting 4096 May 16 12:49 test
drwxrwxr-x  3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x  5 wangting wangting 4096 Apr  2 15:03 tez-0.9.2_bak0410
drwxr-xr-x  8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x  2 wangting wangting 4096 May 16 11:51 ztdata_0516
2021-05-16 12:49:16,801 INFO  [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
Found 10 items
drwxr-xr-x   - wangting supergroup          0 2021-03-17 11:44 /20210317
drwxr-xr-x   - wangting supergroup          0 2021-03-19 10:51 /20210319
drwxr-xr-x   - wangting supergroup          0 2021-04-24 17:05 /flume
-rw-r--r--   3 wangting supergroup  338075860 2021-03-12 11:50 /hadoop-3.1.3.tar.gz
drwxr-xr-x   - wangting supergroup          0 2021-05-13 15:31 /hbase
drwxr-xr-x   - wangting supergroup          0 2021-04-04 11:07 /test.db
drwxr-xr-x   - wangting supergroup          0 2021-03-19 11:14 /testgetmerge
drwxr-xr-x   - wangting supergroup          0 2021-04-10 16:23 /tez
drwx------   - wangting supergroup          0 2021-04-02 15:14 /tmp
drwxr-xr-x   - wangting supergroup          0 2021-04-02 15:25 /user
wangting@ops01:/opt/module/test >

After the task is completed, verify: In the /opt/module/test directory, the shell_log_0516.log file was successfully created, indicating that the task run_bash was successfully suspended and executed.

Guess you like

Origin blog.csdn.net/wt334502157/article/details/116891032