Distributed Task Scheduling: Introduction to PowerJob

1. Project introduction

1. Product Features

PowerJob ** (formerly OhMyScheduler) ** is a new generation of distributed task scheduling and computing framework, its main features are as follows :

  • Easy to use: Provides a front-end web interface, allowing developers to visually complete the management of scheduling tasks (adding, deleting, modifying, checking), task running status monitoring, and running log viewing.
  • Perfect timing strategy: it supports four timing scheduling strategies: CRON expression, fixed frequency, fixed delay and API.
  • Rich execution modes: It supports four execution modes: stand-alone, broadcast, Map, and MapReduce. Among them, the Map/MapReduce processor enables developers to obtain cluster distributed computing capabilities with just a few lines of code .
  • Workflow support : Supports online configuration of task dependencies (DAG), visually arranges tasks, and also supports data transfer between upstream and downstream tasks, as well as multiple node types (judgment nodes & nested workflow nodes).
  • The actuator supports a wide range: supports Spring Bean, built-in/external Java classes, and can integrate Shell, Python, HTTP, SQL and other processors with one click by introducing the official dependency package, which has a wide range of applications.
  • Convenient operation and maintenance: supports online log function, and the log generated by the actuator can be displayed in real time on the front-end console page, reducing debugging costs and greatly improving development efficiency.
  • Dependency simplification: minimal dependency only on relational databases (MySQL/ PostgreSQL **/Oracle/MS SQLServer…) **
  • High availability & high performance: The scheduling server has been carefully designed, changing the database lock-based strategy of other scheduling frameworks, and realizing lock-free scheduling. Deploying multiple scheduling servers can simultaneously achieve high availability and performance improvement (supporting unlimited horizontal expansion).
  • Failover and recovery: After a task fails to execute, it can be retried according to the configured retry strategy. As long as the executor cluster has enough computing nodes, the task can be successfully completed.

Online trial : https://www.yuque.com/powerjob/guidence/hnbskn

2. Applicable scenarios

  • Business scenarios with regular execution requirements: such as full data synchronization every morning, generation of business reports, overtime cancellation of unpaid orders, etc.
  • There are business scenarios that require all machines to execute together: such as using broadcast execution mode to clean up cluster logs.
  • There are business scenarios that require distributed processing: for example, a large amount of data needs to be updated, and it takes a long time to execute on a single machine. Map/MapReduce processors can be used to complete task distribution and mobilize the entire cluster to accelerate computing.
  • There are business scenarios that need to delay the execution of certain tasks: such as order expiration processing, etc.

3. Design goals

The design goal of PowerJob is an enterprise-level distributed task scheduling platform , that is, it becomes the company's internal task scheduling middleware . The entire company deploys the scheduling center powerjob-server uniformly, and all business line applications only need to rely on powerjob-worker to access the scheduling center to obtain task scheduling and distributed computing capabilities.

4. Comparison of similar products

QuartZ xxl-job SchedulerX 2.0 PowerJob
timing type CRON CRON CRON, fixed frequency, fixed delay, OpenAPI CRON, fixed frequency, fixed delay, OpenAPI
task type Built-in Java Built-in Java, GLUE Java, Shell, Python and other scripts Built-in Java, external Java (FatJar), ​​Shell, Python and other scripts Built-in Java, external Java (container), Shell, Python and other scripts
Distributed tasks none static sharding MapReduce dynamic sharding MapReduce dynamic sharding
Online task management not support support support support
log white screen not support support not support support
Scheduling method and performance Based on database locks, there is a performance bottleneck Based on database locks, there is a performance bottleneck unknown Lock-free design, strong performance without upper limit
Alarm monitoring none mail Short message Mail, providing an interface to allow developers to extend
system dependent Relational databases (MySQL, Oracle...) MySQL RMB Any relational database supported by Spring Data Jpa (MySQL, Oracle...)
DAG workflow not support not support support support

2. Basic concepts

This section will explain the proper noun concepts involved in this framework to help developers better understand and use the framework.

Grouping concepts:

  • appName: application name, it is recommended to be consistent with the application name that the user actually accesses PowerJob, and is used for business grouping and isolation . An appName is equal to a business cluster, that is, an actual Java project .

Core idea:

  • Task (Job): Describes the task information that needs to be scheduled by PowerJob, including task name, scheduling time, processor information, etc.

  • Task instance (JobInstance, referred to as Instance): A task (Job) will generate a task instance (Instance) after it is scheduled for execution, and the task instance records the runtime information of the task (the relationship between a task and a task instance is similar to the relationship between a class and an object).

  • Job (Task): The execution unit of a task instance. There is at least one Task in a JobInstance. The specific rules are as follows:

    • Standalone task (STANDALONE): one JobInstance corresponds to one Task
    • Broadcast task (BROADCAST): A JobInstance corresponds to N Tasks, and N is the number of cluster machines, that is, each machine will generate a Task
    • Map/MapReduce task: one JobInstance corresponds to several Tasks, which are generated manually by the developer
  • Workflow (Workflow): A set of tasks (Job) described by DAG (Directed Acyclic Graph) for task orchestration.

  • Workflow instance (WorkflowInstance): After the workflow is scheduled for execution, a workflow instance will be generated, which records the runtime information of the workflow.

extended concept

  • JVM container: Organize a bunch of Java files (many Java processors developed by developers) in the dimension of Maven project, which can be dynamically published through the front-end webpage and loaded by the executor, with strong scalability and flexibility.
  • OpenAPI: Allows developers to complete manual operations through the interface, making the system more flexible as a whole. Developers can easily extend the original functions of PowerJob based on API.
  • Lightweight tasks: tasks that are executed on a single machine and do not need to be executed at a fixed frequency or fixed delay (>= v4.2.1)
  • Heavyweight tasks: tasks that are not executed on a single machine or executed with a fixed frequency/delay (>= v4.2.1)

Timing task type

  • API: This task will only be triggered by the OpenAPI interface provided in powerjob-client, and the server will not actively schedule it.
  • CRON: The scheduling time of this task is specified by a CRON expression.
  • Fixed frequency: second-level tasks, run every milliseconds, the function is the same as java.util.concurrent.ScheduledExecutorService#scheduleAtFixedRate.
  • Fixed delay: Second-level task, how many milliseconds to delay to run once, the function is the same as java.util.concurrent.ScheduledExecutorService#scheduleWithFixedDelay.
  • Workflow: The task will only be scheduled and executed by the workflow it belongs to, and the server will not actively schedule the task. If the task does not belong to any workflow, the task will not be scheduled.

Remarks: Fixed-delay and fixed-frequency tasks are collectively referred to as second-level tasks. These two tasks cannot be stopped. Only when the task is closed or deleted can the task be truly stopped .

3. Project address

PowerJob main project: https://github.com/PowerJob/PowerJob

PowerJob front-end project: https://github.com/PowerJob/PowerJob-Console

PowerJob official website project: https://github.com/PowerJob/Official-Website

Guess you like

Origin blog.csdn.net/zhanggqianglovec/article/details/131501227