Big Data learning azkaban learning task scheduling 26

 1: Azkaban Overview

Azkaban is a distributed workflow manager at LinkedIn on implementation to solve Hadoop jobs depend issues. We need to press the job to run sequentially , from ETL work to data analysis product.

 

2: Why workflow scheduling systems

1) A complete data analysis system is usually a large number of task units:
  shell script, the Java program, MapReduce program, Hive scripts.


2 ) the time between each task unit has front and back dependencies.


3 ) In order to well organize the implementation of such a complex program requires a workflow scheduling system to schedule the execution.
  For example, we might have such a demand, the system generates a business day, 20G original data, we have subjected to daily treatment, the processing steps are as follows:
  ( 1 ) by Hadoop first original data sync to HDFS on;
  ( 2 ) by means of MapReduce computing framework of the original data are calculated to generate a plurality of data storage in the form of partition table Hive table;
  ( 3 ) the need for Hive data is performed in a plurality of tables JOIN to give a detailed data Hive large table ;
  ( 4 ) the details of complex statistical data analysis, results reporting information;
  ( 5 result data) needs to be synchronized to the statistical analysis of the resulting business systems, calls for business use.
As shown below:

2: Features:

1 ) provides users with a very friendly visual interface -> web interface

2) very convenient upload workflow - "labeled archive

3 ) Set the relationship between tasks

4 ) permission settings - "to delete the library on foot

5 ) Modular

6 ) at any time to stop and start the task

7 ) You can view the log records

3: and Oozie Comparison

And Oozie contrast, azkaban is a lightweight scheduling tool.

Function enterprise applications are not a small minority of the functions can be used Azkaban.

 

1 ) Function

Two task flow scheduler can schedule use mr, java, script workflow tasks

Can be timed schedule ...

 

2) Use

az direct parameter passing

Oozie direct mass participation, support EL expressions ...

 

3) Timing

az scheduled task on time

Oozie tasks based on time and data

 

4 ) Resources

az strict access control

Oozie no access control Accord

 

4: Azkaban installation deployment

Ready to work

1 ) Snapshot

2 ) Upload the installation package

alt + p

3) unpack rename

tar -zxvf

mv

4) mysql in azkaban script imports

source /root/hd/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql

 

Installation and deployment

1 ) Create a SSL ( secure connection ) Configuration

Server requires a certificate

keytool -keystore keystore -alias jetty -genkey -keyalg RSA

 

2 ) Time synchronization setting

Generate a time zone file

tzselect generation

5->9->1->yes

Time zone file copy

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

Cluster Time Synchronization

crt in turn sends an interactive window

sudo date -s '2018-11-28 20:41:33'

 

3 ) modify the configuration file

4 ) Start web server

bin/azkaban-web-start.sh

 

5) Start Actuator

bin/azkaban-executor-start.sh

 

6) access web

https://192.168.50.183:8443

 

Here the installation steps I write very rough, you can refer to this article to install deployment

https://www.cnblogs.com/chenmingjun/p/10506488.html

 

 

Combat operations

Case I: the Command type of single job

Create a job description file:

Then packaged into a zip file uploaded to azkaban in

 

 

Case II: the Command type as many job Case

Creating f.job

Creating b.job

Wherein b dependent f

Then these 2 Ge job file is packaged in a zip and upload it to azkaban in.

 

 

Case III: HDFS task operation

 

Create a job file

Note the use of the hdfs command must be the full path of the command in Linux

 

Then this package job into a zip file and upload it to azkaban in.

Case 4 : Running MapReduce program

这里我们用的是hadoop自带的一个例子程序。下面编写job文件

 

 

 

将单词计数的jar文件 和job文件打包上传到azkaban任务中

 

 

 

 

案例5 hive脚本任务

1:创建hive脚本 

 

 

2:编写job文件

 

 

然后将这个job文件和sql文件打包成zip并上传到azkaban中。

 

这里有毒啊!!不知道为什么运行hive有问题

Execute报错信息

 

 

WebServer报错信息

 

Guess you like

Origin www.cnblogs.com/hidamowang/p/10935402.html
Recommended