Explore the practice of open source workflow engine Azkaban in MRS

Abstract: This article mainly introduces how to build azkaban from 0-1 on HUAWEI CLOUD and instruct users how to submit jobs to MRS.

This article is shared from HUAWEI CLOUD Community " Practice of Open Source Workflow Engine Azkaban in MRS ", author: Ah YeYe.

environmental input

Practical version: Apache azkaban 4.0.0 (take the stand-alone version as an example, the cluster version configuration process is similar), MRS 3.1.0 normal cluster.

Azkaban plugin address
Azkaban official website
Azkaban source address

Install azkaban-solo-server

Azkaban does not provide binary packages, users need to download the source code to compile and package, and obtain "azkaban-solo-server.zip" and "azkaban-db.zip".

1. Environmental preparation.

  • Purchase a Linux elastic cloud server ECS from HUAWEI CLOUD to install and run the MRS cluster client and Azkaban, and bind the elastic public IP.
  • Install and run the MRS cluster client on the ECS. For example, the installation directory is "/opt/client".
  • To prepare the data table, refer to the MySQL tutorial.
  • Install MySQL and grant native access. Note: Azkaban 4.0.0 is compatible with MySQL 5.1.28 by default.
  • Create an Azkaban database, extract "azkaban-db.zip" to obtain "create-all-sql-*.sql", and initialize it.

2. Upload the installation package and unzip it

  • Upload "azkaban-solo-server.zip" to the "/opt/azkaban" directory
  • Execute the following commands to decompress and delete the installation package
unzip azkaban-solo-server.zip
rm -f unzip azkaban-solo-server.zip

3. Modify the configuration file "azkaban-solo-server/conf/azkaban.properties"

The configuration port can be modified according to the actual situation. The port numbers of "jetty.port" and "mysql.port" can use the default values.

jetty.port=8081 
database.type=mysql 
mysql.port=3306 
mysql.host=x.x.x.x 
mysql.database=azkaban 
mysql.user=xxx 
mysql.password=xxx

4. Start azkaban-solo-server

source /opt/client/bigdata_env
cd /opt/azkaban/azkaban-solo-server
sh bin/start-solo.sh

5. Access Azkaban WEB UI

Enter the " http:// ECS Elastic IP:port" URL in the browser to enter the Azkaban WebUI login interface, and enter the user information to log in to the Azkaban service.

illustrate:

Default port (port): 8081;
username/password: azkaban/azkaban;
user account configuration file: /opt/azkaban/azkaban-solo-server/conf/azkaban-users.xml

azkaban-hdfs-viewer plugin configuration guide

To connect to HDFS, users need to download the source code to compile and obtain "az-hdfs-viewer.zip", and have completed the installation of azkaban-solo-server.

1. Environmental preparation

  • Configure Azkaban user, add supergroup user group to grant access to HDFS
  • Add the Azkaban proxy user to the HDFS configuration file "core-stie.xml"
    a. Log in to the Manager page and select "Cluster > Service > HDFS > Configuration > All Configuration > HDFS (Service) > Custom"
    b. In the parameter file" core-site.xml" and add the following configuration items:

c. After the configuration is complete, click "Save" in the upper left corner
d. Select "Overview > More > Restart Service" and enter the password to restart the HDFS service

2. Upload the installation package and unzip it

  • Upload "az-hdfs-viewer.zip" to "/opt/azkaban/azkaban-solo-server/plugins/viewer" directory
  • Execute the following commands to decompress and delete the installation package
unzip az-hdfs-viewer.zip
rm -f az-hdfs-viewer.zip
  • Rename the unzipped file to "hdfs"
    mv az-hdfs-viewer hdfs

3. Modify and save the configuration file

  • Modify the proxy user in the "azkaban-solo-server/plugins/viewer/hdfs/conf/plugin.properties" file to the Azkaban proxy user configured in step 1. Modify the storage directory of "execute-as-user" to the Azkaban installation directory, such as "opt/azkaban/azkaban-solo-server".
viewer.name=HDFS 
viewer.path=hdfs 
viewer.order=1 
viewer.hidden=false 
viewer.external.classpaths=extlib/* 
viewer.servlet.class=azkaban.viewer.hdfs.HdfsBrowserServlet 
hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0 
azkaban.should.proxy=false 
proxy.user=azkaban   // mrs集群中配置的azkaban代理用户名 
allow.group.proxy=true 
file.max.lines=1000 
#Specifying the error message we want user to get when they don't have permissionsviewer.access_denied_message=The folder you are trying to access is protected. 
execute.as.user=false 
// execute-as-user存放目录 
azkaban.native.lib=/opt/azkaban/azkaban-solo-server

If the file does not exist, you need to manually create and configure the above content

4. Copy the required packages of the HDFS plugin to the "/opt/azkaban/azkaban-solo-server/extlib" directory

cp /opt/client/HDFS/hadoop/share/hadoop/hdfs/*.jar /opt/azkaban/azkaban-solo-server/extlib
cp /opt/client/HDFS/hadoop/share/hadoop/client/hadoop-client-api-3.1.1-mrs-2.0.jar /opt/azkaban/azkaban-solo-server/extlib
cp /opt/client/HDFS/hadoop/share/hadoop/common/*.jar /opt/azkaban/azkaban-solo-server/extlib

Different MRS versions require different Hadoop-related versions. You can query the directory through find /opt/client.

5. Check the directory structure

The directory structure should be:

- azkaban-solo-server     
- bin    
- conf     
- extlib (hadoop相关插件第三方包)     
- lib     
- logs     
- plugins 
 - jobtypes(job插件目录) 
   - commonprivate.properties 
   - hive 
     - plugin.properties 
     - private.properties 
   - hadoopJava 
     - plugin.properties 
     - private.properties       
 - viewer            
   - hdfs                 
     - conf                      
       - plugin.properties                 
   - lib (az-hdfs-viewer.zip解压后的lib)     
- temp     
- web

6. Restart the Azkaban-solo-server service

cd /opt/azkaban/azkaban-solo-server
sh bin/shutdown-solo.sh
sh bin/start-solo.sh

7. Access HDFS Browser

  • Enter the " http:// ECS Elastic IP:8081" URL in the browser to enter the Azkaban WebUI login interface, enter the user information to log in to the Azkaban service
  • Click "HDFS"

plugins-jobtypes hadoop-job deployment run

After installing azkaban-solo-server, deploy and verify hadoop-job

1. Environmental preparation

2. Upload the plugin configuration file

  • Unzip "azkaban-plugins-3.0.0.zip" and get the "hadoopJava" folder under "azkaban-plugins-3.0.0\plugins\jobtype\jobtypes"
  • Upload the "hadoopJava" folder to the "/plugin" directory. If the directory does not exist, you need to create a new one

3. Modify the configuration file "azkaban-solo-server/plugins/jobtypes/commonprivate.properties"

# set execute-as-user 
execute.as.user=false 

hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0 
azkaban.should.proxy=false 
obtain.binary.token=false 
proxy.user=azkaban // MRS集群中配置的Azkaban代理用户名 
allow.group.proxy=true 
// execute-as-user存放目录 
azkaban.native.lib=/opt/azkaban/azkaban-solo-server 

# hadoop 
hadoop.home=/opt/client/HDFS/hadoop   //opt/client为MRS集群客户端安装目录 
hive.home=/opt/client/Hive/Beeline 
spark.home=/opt/client/Spark/spark 
hadoop.classpath=${hadoop.home}/etc/hadoop,${hadoop.home}/share/hadoop/common/*,${hadoop.home}/share/hadoop/common/lib/*,${hadoop.home}/share/hadoop/hdfs/*,${hadoop.home}/share/hadoop/hdfs/lib/*,${hadoop.home}/share/hadoop/yarn/*,${hadoop.home}/share/hadoop/yarn/lib/*,${hadoop.home}/share/hadoop/mapreduce/*,${hadoop.home}/share/hadoop/mapreduce/lib/* 
jobtype.global.classpath=${hadoop.home}/etc/hadoop,${hadoop.home}/share/hadoop/common/*,${hadoop.home}/share/hadoop/common/lib/*,${hadoop.home}/share/hadoop/hdfs/*,${hadoop.home}/share/hadoop/hdfs/lib/*,${hadoop.home}/share/hadoop/yarn/*,${hadoop.home}/share/hadoop/yarn/lib/*,${hadoop.home}/share/hadoop/mapreduce/*,${hadoop.home}/share/hadoop/mapreduce/lib/*

4. Sample program verification

  • Prepare the test data "input.txt" file. The content of the file can be customized by referring to the following format. The storage path is "/opt/input.txt"
Ross    male    33  3674 
Julie   male    42  2019 
Gloria  female  45  3567 
Carol   female  36  2813
  • Upload the test data "input.txt" to "hdfs /tmp/azkaban_test" through the HDFS client
    a. As the client installation user, log in to the node
    where the client is installed b. Execute the following command to switch to the client installation directory cd /opt/ client
    c. Execute the following command to configure the environment variable source bigdata_env
    d. Execute the HDFS Shell command to upload the file hdfs dfs -put /opt/input.txt /tmp/azkaban_test
  • The user writes and saves the "wordcount.job" file locally with the following contents
type=hadoopJava 
job.extend=false 
job.class=azkaban.jobtype.examples.java.WordCount 
classpath=./lib/*,/opt/azkaban-solo-server-0.1.0-SNAPSHOT/lib/* 
force.output.overwrite=true 
input.path=/tmp/azkaban_test 
output.path=/tmp/azkaban_test_out
  • Enter the " http:// ECS Elastic IP:port" URL in the browser to enter the Azkaban WebUI login interface, enter the user information to log in to the Azkaban service, and submit the job for verification.

Spark command job—see Client Commands

There are two running modes for spark tasks, one is command mode and the other is spark jobtype mode.

  1. Command method: You need to specify spark_home as /opt/client/Spark/spark/
    On the client node of the MRS cluster, you can obtain the actual Spark installation address through echo $SPARK_HOME.
    Set the ECS global environment variable where azkanban is located. After adding source {MRS client}, you need to restart azkaban to take effect.
  2. Jobtype method: Refer to plugins-jobtypes hadoop-job deployment and operation.

 

Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/5527528