Abstract: This article mainly introduces how to build azkaban from 0-1 on HUAWEI CLOUD and instruct users how to submit jobs to MRS.
This article is shared from HUAWEI CLOUD Community " Practice of Open Source Workflow Engine Azkaban in MRS ", author: Ah YeYe.
environmental input
Practical version: Apache azkaban 4.0.0 (take the stand-alone version as an example, the cluster version configuration process is similar), MRS 3.1.0 normal cluster.
Azkaban plugin address
Azkaban official website
Azkaban source address
Install azkaban-solo-server
Azkaban does not provide binary packages, users need to download the source code to compile and package, and obtain "azkaban-solo-server.zip" and "azkaban-db.zip".
1. Environmental preparation.
- Purchase a Linux elastic cloud server ECS from HUAWEI CLOUD to install and run the MRS cluster client and Azkaban, and bind the elastic public IP.
- Install and run the MRS cluster client on the ECS. For example, the installation directory is "/opt/client".
- To prepare the data table, refer to the MySQL tutorial.
- Install MySQL and grant native access. Note: Azkaban 4.0.0 is compatible with MySQL 5.1.28 by default.
- Create an Azkaban database, extract "azkaban-db.zip" to obtain "create-all-sql-*.sql", and initialize it.
2. Upload the installation package and unzip it
- Upload "azkaban-solo-server.zip" to the "/opt/azkaban" directory
- Execute the following commands to decompress and delete the installation package
unzip azkaban-solo-server.zip
rm -f unzip azkaban-solo-server.zip
3. Modify the configuration file "azkaban-solo-server/conf/azkaban.properties"
The configuration port can be modified according to the actual situation. The port numbers of "jetty.port" and "mysql.port" can use the default values.
jetty.port=8081
database.type=mysql
mysql.port=3306
mysql.host=x.x.x.x
mysql.database=azkaban
mysql.user=xxx
mysql.password=xxx
4. Start azkaban-solo-server
source /opt/client/bigdata_env
cd /opt/azkaban/azkaban-solo-server
sh bin/start-solo.sh
5. Access Azkaban WEB UI
Enter the " http:// ECS Elastic IP:port" URL in the browser to enter the Azkaban WebUI login interface, and enter the user information to log in to the Azkaban service.
illustrate:
Default port (port): 8081;
username/password: azkaban/azkaban;
user account configuration file: /opt/azkaban/azkaban-solo-server/conf/azkaban-users.xml
azkaban-hdfs-viewer plugin configuration guide
To connect to HDFS, users need to download the source code to compile and obtain "az-hdfs-viewer.zip", and have completed the installation of azkaban-solo-server.
1. Environmental preparation
- Configure Azkaban user, add supergroup user group to grant access to HDFS
- Add the Azkaban proxy user to the HDFS configuration file "core-stie.xml"
a. Log in to the Manager page and select "Cluster > Service > HDFS > Configuration > All Configuration > HDFS (Service) > Custom"
b. In the parameter file" core-site.xml" and add the following configuration items:
c. After the configuration is complete, click "Save" in the upper left corner
d. Select "Overview > More > Restart Service" and enter the password to restart the HDFS service
2. Upload the installation package and unzip it
- Upload "az-hdfs-viewer.zip" to "/opt/azkaban/azkaban-solo-server/plugins/viewer" directory
- Execute the following commands to decompress and delete the installation package
unzip az-hdfs-viewer.zip
rm -f az-hdfs-viewer.zip
- Rename the unzipped file to "hdfs"
mv az-hdfs-viewer hdfs
3. Modify and save the configuration file
- Modify the proxy user in the "azkaban-solo-server/plugins/viewer/hdfs/conf/plugin.properties" file to the Azkaban proxy user configured in step 1. Modify the storage directory of "execute-as-user" to the Azkaban installation directory, such as "opt/azkaban/azkaban-solo-server".
viewer.name=HDFS
viewer.path=hdfs
viewer.order=1
viewer.hidden=false
viewer.external.classpaths=extlib/*
viewer.servlet.class=azkaban.viewer.hdfs.HdfsBrowserServlet
hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0
azkaban.should.proxy=false
proxy.user=azkaban // mrs集群中配置的azkaban代理用户名
allow.group.proxy=true
file.max.lines=1000
#Specifying the error message we want user to get when they don't have permissionsviewer.access_denied_message=The folder you are trying to access is protected.
execute.as.user=false
// execute-as-user存放目录
azkaban.native.lib=/opt/azkaban/azkaban-solo-server
If the file does not exist, you need to manually create and configure the above content
4. Copy the required packages of the HDFS plugin to the "/opt/azkaban/azkaban-solo-server/extlib" directory
cp /opt/client/HDFS/hadoop/share/hadoop/hdfs/*.jar /opt/azkaban/azkaban-solo-server/extlib
cp /opt/client/HDFS/hadoop/share/hadoop/client/hadoop-client-api-3.1.1-mrs-2.0.jar /opt/azkaban/azkaban-solo-server/extlib
cp /opt/client/HDFS/hadoop/share/hadoop/common/*.jar /opt/azkaban/azkaban-solo-server/extlib
Different MRS versions require different Hadoop-related versions. You can query the directory through find /opt/client.
5. Check the directory structure
The directory structure should be:
- azkaban-solo-server
- bin
- conf
- extlib (hadoop相关插件第三方包)
- lib
- logs
- plugins
- jobtypes(job插件目录)
- commonprivate.properties
- hive
- plugin.properties
- private.properties
- hadoopJava
- plugin.properties
- private.properties
- viewer
- hdfs
- conf
- plugin.properties
- lib (az-hdfs-viewer.zip解压后的lib)
- temp
- web
6. Restart the Azkaban-solo-server service
cd /opt/azkaban/azkaban-solo-server
sh bin/shutdown-solo.sh
sh bin/start-solo.sh
7. Access HDFS Browser
- Enter the " http:// ECS Elastic IP:8081" URL in the browser to enter the Azkaban WebUI login interface, enter the user information to log in to the Azkaban service
- Click "HDFS"
plugins-jobtypes hadoop-job deployment run
After installing azkaban-solo-server, deploy and verify hadoop-job
1. Environmental preparation
- Get the "azkaban-plugins-3.0.0.zip" archive
- Compile and obtain the hadoopjava-wordcount instance package "az-hadoop-jobtype-plugin.jar" provided by azkaban
2. Upload the plugin configuration file
- Unzip "azkaban-plugins-3.0.0.zip" and get the "hadoopJava" folder under "azkaban-plugins-3.0.0\plugins\jobtype\jobtypes"
- Upload the "hadoopJava" folder to the "/plugin" directory. If the directory does not exist, you need to create a new one
3. Modify the configuration file "azkaban-solo-server/plugins/jobtypes/commonprivate.properties"
# set execute-as-user
execute.as.user=false
hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0
azkaban.should.proxy=false
obtain.binary.token=false
proxy.user=azkaban // MRS集群中配置的Azkaban代理用户名
allow.group.proxy=true
// execute-as-user存放目录
azkaban.native.lib=/opt/azkaban/azkaban-solo-server
# hadoop
hadoop.home=/opt/client/HDFS/hadoop //opt/client为MRS集群客户端安装目录
hive.home=/opt/client/Hive/Beeline
spark.home=/opt/client/Spark/spark
hadoop.classpath=${hadoop.home}/etc/hadoop,${hadoop.home}/share/hadoop/common/*,${hadoop.home}/share/hadoop/common/lib/*,${hadoop.home}/share/hadoop/hdfs/*,${hadoop.home}/share/hadoop/hdfs/lib/*,${hadoop.home}/share/hadoop/yarn/*,${hadoop.home}/share/hadoop/yarn/lib/*,${hadoop.home}/share/hadoop/mapreduce/*,${hadoop.home}/share/hadoop/mapreduce/lib/*
jobtype.global.classpath=${hadoop.home}/etc/hadoop,${hadoop.home}/share/hadoop/common/*,${hadoop.home}/share/hadoop/common/lib/*,${hadoop.home}/share/hadoop/hdfs/*,${hadoop.home}/share/hadoop/hdfs/lib/*,${hadoop.home}/share/hadoop/yarn/*,${hadoop.home}/share/hadoop/yarn/lib/*,${hadoop.home}/share/hadoop/mapreduce/*,${hadoop.home}/share/hadoop/mapreduce/lib/*
4. Sample program verification
- Prepare the test data "input.txt" file. The content of the file can be customized by referring to the following format. The storage path is "/opt/input.txt"
Ross male 33 3674
Julie male 42 2019
Gloria female 45 3567
Carol female 36 2813
- Upload the test data "input.txt" to "hdfs /tmp/azkaban_test" through the HDFS client
a. As the client installation user, log in to the node
where the client is installed b. Execute the following command to switch to the client installation directory cd /opt/ client
c. Execute the following command to configure the environment variable source bigdata_env
d. Execute the HDFS Shell command to upload the file hdfs dfs -put /opt/input.txt /tmp/azkaban_test - The user writes and saves the "wordcount.job" file locally with the following contents
type=hadoopJava
job.extend=false
job.class=azkaban.jobtype.examples.java.WordCount
classpath=./lib/*,/opt/azkaban-solo-server-0.1.0-SNAPSHOT/lib/*
force.output.overwrite=true
input.path=/tmp/azkaban_test
output.path=/tmp/azkaban_test_out
- Enter the " http:// ECS Elastic IP:port" URL in the browser to enter the Azkaban WebUI login interface, enter the user information to log in to the Azkaban service, and submit the job for verification.
Spark command job—see Client Commands
There are two running modes for spark tasks, one is command mode and the other is spark jobtype mode.
- Command method: You need to specify spark_home as /opt/client/Spark/spark/
On the client node of the MRS cluster, you can obtain the actual Spark installation address through echo $SPARK_HOME.
Set the ECS global environment variable where azkanban is located. After adding source {MRS client}, you need to restart azkaban to take effect. - Jobtype method: Refer to plugins-jobtypes hadoop-job deployment and operation.
Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~