Summary of commonly used scheduling tools Oozie, Azkaban, Airflow

The significance of
scheduling : It is indispensable in a project when scheduling. The project generally includes multiple tasks such as programs and hive scripts. We generally write tasks in shell scripts, and finally string all tasks together through scheduling tools.

airflow is a programmable workflow scheduling and monitoring platform. Distributed deployment and calling, based on DAG (it does not have a queue function, it needs to use third-party components, such as redis, rabbitMQ), airflow uses python for programming and
development, which can perform rich task processing, including the execution of bash commands, python code calls, Send mail, send Http request, etc.
airflow websever -D starts the web interface of airflow

azkaban is a workflow monitoring and scheduling tool. Distributed deployment can be performed by configuring tasks in properties format in the job file and packaging them into a zip package for scheduling. The internal architecture of
azkaban includes three parts excutorServer, webServer, and mysql, which are respectively responsible for task execution, web interface display, and scheduling information storage.
azkaban is a lightweight scheduler.

oozie is a distributed workflow scheduling framework based on hadoop. oozie performs scheduling by configuring tasks in xml files. MR tasks are started when the scheduling is executed. It depends on the hadoop platform and is a heavyweight framework.

The difference in scheduling framework
Timing aspects:
  1. Azkaban's timing execution tasks are based on time
  2. Oozie's timing execution tasks are based on time and input data
3. Airflow timing tasks can be implemented based on time and data with python code.
Development:
1. azkaban Use properties files to define workflow
2, oozie uses xml to define workflow
3, airflow uses python programming to define workflow Advantages and
disadvantages:
1. azkaban, lightweight, relatively simple to develop, and has a good web interface to view and monitor tasks, But azkaban's scheduled tasks are only based on time, not data.
2. Airflow, micro-heavyweight, distributed deployment needs to rely on third-party components, development needs to use python for development, there is a certain degree of difficulty.
3. Oozie, heavyweight, can be easily installed through CM or Ambari, and it is difficult to configure the workflow using xml files.

Guess you like

Origin blog.csdn.net/qq_39719415/article/details/106547283