Read a text distributed task scheduling platform XXL-JOB

This paper describes a distributed task scheduling platform XXL-JOB (v2.1.0 version), including the features, implementation principle, advantages and disadvantages, and other similar frameworks comparison

basic introduction

Project development, often following scenario requires a distributed task scheduling:

  • Task presence of multiple instances of the same service when mutually exclusive and requires coordination
  • Perform regular tasks need to support high availability, monitoring, operation and maintenance, fault alarm
  • Unified management and operation of the need to track individual services node timing of the task, and the task attribute information, such as task-owned service, the responsibility belongs to people

Therefore, XXL-JOB emerged:
XXL-JOB is an open source lightweight distributed task scheduling platform, its core design goal is to develop rapid, simple to learn, lightweight, easy to expand, out of the box, where " XXL "is the lead author, the public comment Xu snow abbreviated name

Since 2015, open source, the company has access to hundreds of online product line, access scenarios involving business electricity supplier, O2O business and operations, such as large data

Features

The main features are as follows:

  • Simple and flexible
    to provide Web pages for task management, user management system to support management, access control;
    support vessel deployment;
    support cross-platform job scheduling by common HTTP;

  • Rich task management
    support page for the task CRUD operations;
    support for scripting tasks page, the command-line tasks, Java code, and perform the task;
    support tasks cascade arrangement, after the implementation of the parent task subtasks trigger execution;
    supports setting task priority ;
    support node routing setting specifies the task execution strategies, including polling, random, broadcast, failover, busy metastasis;
    support Cron embodiment, task dependencies, the dispatch center triggered API interface task execution

  • High Performance
    control center based on a multi-threaded thread pool triggered scheduling tasks, task fast, slow task scheduling based isolation thread pool, providing the system performance and stability;
    scheduling process to achieve full asynchronous design, such as scheduling asynchronous, asynchronous operation, an asynchronous callback, etc. effective scheduling of intensive traffic clipping;

  • High availability
    mission control center, mission execution nodes are clustered deployment, support for dynamic expansion, failover
    supports failover routing policy configuration tasks, execute node is unavailable is automatically transferred to the other nodes to perform
    support tasks timeout control, failure retry configuration
    support tasks processing blocking strategy: process strategy too late to perform the task of scheduling when the task execution node busy, including: serial, abandoned, coverage strategy

  • Easy to monitor the operation and maintenance
    support setting mission failed e-mail alerts, reserved interface supports SMS, nail alarm;
    real-time review the task execution performance data charts, task progress monitoring data, complete the task execution log;

system design

1 design ideas

The scheduling behavior abstract form a "control center" public platform, and the platform itself does not bear the business logic, " control center " is responsible for initiating scheduling requests;
to abstract task into a decentralized JobHandler, referred to "actuator" unified management " actuator "scheduling request is responsible for receiving and executing business logic corresponding JobHandler;
therefore," scheduling "and" task "can be decoupled from each other the two parts, improve overall system stability and scalability;

2 system components

  • Scheduling module (control center) : manage scheduling information, dispatch is arranged according to a schedule, service code itself does not bear. Decoupling task scheduling system and improve system availability and stability while scheduling system performance is no longer limited by the task module; support visualization, simple management and dynamic scheduling information, including the new tasks, update, delete, alarm and other tasks, All of the above operations will with immediate effect, while supporting the implementation and results of monitoring and dispatching log, support actuator Failover

  • Execution module (effector) : responsible for receiving a scheduling request and perform tasks logic. Task module to focus on other tasks performed operation, development and maintenance of more simple and efficient; receiving "control center" of the execution request, termination request, and the like log request

Functional Architecture

3 works

XXL-JOB task execution process

  • Task execution is the address of the dispatch center configuration, automatic registration to the dispatch center
  • Trigger conditions to achieve the task, the task issued under dispatch center
  • Execution based on thread pool to perform tasks and put the results into the memory queue, the execution log written to the log file
  • The results of the callback queue threads consume memory in the actuator, the initiative reported to the dispatch center
  • When users log in to view the task dispatch center, dispatch center to perform the requested task, a task to perform tasks reads the log file and return to the log details

4 HA design

4.1 dispatch center availability

Dispatch center to support multi-node deployment, based on a database row locks to ensure that only one dispatch center node triggers task scheduling, reference com.xxl.job.admin.core.thread.JobScheduleHelper # start

Connection conn = XxlJobAdminConfig.getAdminConfig().getDataSource().getConnection();
connAutoCommit = conn.getAutoCommit();
conn.setAutoCommit(false);
preparedStatement = conn.prepareStatement(  "select * from xxl_job_lock where lock_name = 'schedule_lock' for update" );
preparedStatement.execute();

# 触发任务调度

# 事务提交
 conn.commit();

4.2 high availability task scheduling

  • Routing policy
    control center based routing policy routing node to perform a mission, XXL-JOB provides the following routing policies to ensure high availability task scheduling:
    Busy Transfer Strategies : Front issued rpc task node initiates a heartbeat request to inquire whether busy actuators, If the actuator is transferred to the node returns busy node to perform other actuator (refer com.xxl.job.admin.core.route.strategy.ExecutorRouteBusyover)
    failover strategy : to launch the task execution issued before node rpc request to query whether a heartbeat online, if the actuator is not returned node unavailable or returns the actuator moves to another node performs (reference com.xxl.job.admin.core.route.strategy.ExecutorRouteFailover)

  • Blocking processing strategy
    when multiple tasks to perform the same task id node exists is not executed, you will need to mission trade-offs based on the blocking strategy:
    Serial strategy : default policy, task queues, discard the old mission strategies , discard new task policy
    ( reference: com.xxl.job.core.biz.impl.ExecutorBizImpl # run)

Compare similar framework

characteristic quartz elastic-job-lite xxl-job LTS
rely MySQL、jdk jdk、zookeeper mysql、jdk jdk、zookeeper、maven
High Availability Multi-node deployment, through a competitive database locks to ensure that only one node to perform tasks By registering with the discovery zookeeper, you can dynamically add server Based on competitive database lock to ensure that only one node to perform tasks to support the level of expansion. May increase the regular manual tasks, start and pause the task, there is monitoring Cluster deployment, you can dynamically add servers. You may increase the regular manual tasks, start and pause task. There monitor
Task slice ×
Management Interface ×
Degree of difficulty simple simple simple Slightly complicated
Advanced Features - Elastic expansion, a variety of operating modes, failover, state run collection, multithreading data, idempotency, fault tolerance, spring namespace support Elastic expansion, fragmentation broadcast, failover, Rolling real-time log, GLUE (support online editing code, free release), progress monitoring tasks, task dependencies, data encryption, email alert, run the report, international Support spring, spring boot, business logger, SPI extended support, failover, node monitoring task execution results support the diversification, FailStore fault tolerance, dynamic expansion.
new version update Half did not update 2 years did not update Recently updated 1 year did not update

use

Quick Start

Specifically how to quickly get started with the official document: http://www.xuxueli.com/xxl-job/ been introduced was more detailed and clear, not repeat them

Precautions

  • A clock synchronization
    control center and the task execution require time synchronization, time synchronization error is required within 3 minutes, or thrown
    Reference: com.xxl.rpc.remoting.provider.XxlRpcProviderFactory # invokeService
if (System.currentTimeMillis() - xxlRpcRequest.getCreateMillisTime() > 3*60*1000) {
    xxlRpcResponse.setErrorMsg("The timestamp difference between admin and executor exceeds the limit.");
    return xxlRpcResponse;
}
  • Question 2 time zones
    task triggered by the dispatch center, according to trigger the dispatch center set up cron expression task, you need to pay attention to the time zone of the machine where the deployment of the dispatch center, according to the time zone custom cron expression

  • 3 task execution service shoot down questions
    issued the dispatch center to complete the task, the actuator during the execution of the task, if the actuator suddenly shoot down service, will lead to the implementation of tasks in the implementation of the dispatch center, dispatch center and not launch failure and try again. Even if the task set timeout, the actuator shoot down a long time did not lead to results in the task execution is completed, the control center interface will not see the task timeout, timeout because the task is detected by the actuator and reported to the dispatch center

Therefore met tasks not performed for a long time to complete, we can focus on whether there has been a sudden actuator shoot down service

  • 4 elegant downtime
    actuators to perform tasks based on asynchronous execution thread pool, when the need to restart the thread pool should be noted that there are problems not performed to complete the task, we need to stop and elegant, can be based on direct XxlJobExecutor.destroy () elegant stop, pay attention to the method in the previous version v2.0.2 bug exists not lead to an elegant stop, version v2.0.2 and later was repaired (reference: https://github.com/xuxueli/xxl-job/issues/727 )

  • 5 failure retry problem
    when performing node part of the service is not available, such as node disk is damaged, but still is in the dispatch center line, control center may still be policy-based routing (including failover policy) to the routing node is not off the assembly line, and continue to retry, continue to fail, causing the number of retries exhausted. So try not to use fixed routing policy strategy, such as fixed first, last fixed

to sum up

XXL-JOB to get started is relatively simple, project source code is relatively clean, easy to read, then you can learn more in-depth understanding of the knowledge of distributed system design, networking, multithreading co-processing, etc., recommended reading

reference

XXL-JOB github repository
XXL-JOB official documents

Guess you like

Origin www.cnblogs.com/caison/p/11641161.html