CreditEase Open Source | distributed task scheduling SIA-TASK platform architecture design and operation of process

 

A distributed task scheduling background

Whether or Internet applications, enterprise applications, are filled with a large number of batch jobs. We often need some scheduling system to help solve the problem. With the gradual evolution of the micro-service architecture, and gradually evolved into a distributed architecture monomer, micro-services architecture. In this context, many of the original task scheduling platform can not meet the needs of the business system, so there are some distributed task-based management platform.

Evolution 1.1 Distributed Task Scheduling

In the actual business development process, many times we inevitably need to use some regular tasks to solve the problem. Usually we have a variety of solutions: Use Crontab or SpringCron (Of course, this situation may very few machines and the task is not simple and in many cases). However, when the application complexity increases, an increase in the number of scheduled tasks and dependencies between tasks produce, Crontab timing configuration management tasks can be very confusing, very inefficient. Then it will produce a series of questions:

  • Confusing task management, life cycle can not be unified coordination and management;
  • If there are dependencies between tasks, difficult choreography.

With the development of the Internet, distributed service architecture trend more and more popular. A respective distributed task scheduling also requires a system to manage the timing of the tasks distributed architecture.

1.2 distributed scheduling architecture

Distributed Task Scheduling Design

As more and more vertical applications, it will be more complex interaction between applications, generally we use distributed or micro-service architecture, will be drawn out of the core business to form a separate service. A stand-alone micro-services group gradually form a stable service center so that business applications can respond more quickly to changing market demands.

At this time, a distributed service framework for improving service multiplexing and integration of the key. Meanwhile, as an independent service, general can do little to change the timing of tasks independently, the task of the impact on the overall system and small. We will adopt the normally scheduled tasks separated (as shown above), task execution logic need not be concerned with scheduling arrangement, and can ensure high availability and scheduling the actuator, it is easy to develop and maintain.

1.3 Advantages of distributed task scheduling

On the basis of distributed services architecture, because the number of independent business may be a lot at this time if regular tasks implemented separately in the service, it may appear difficult to manage the situation, and can not be avoided due to the change in the timing of tasks resulting from business restart. Therefore, a separate distributed task scheduling system is necessary and can be used globally integrated management of all regular tasks. At the same time, the individual pulled out configuration tasks, functions as the distributed task scheduling system, will be able to do regular tasks changes do not affect any business, does not affect the entire system:

  • Managed by scheduling and task separation, greatly reducing development and maintenance costs;
  • Distributed deployment to ensure high system availability, scalability, load balancing, improved fault tolerance;
  • Deployment and management tasks can be timed console, convenient and flexible and efficient;
  • Tasks can be persisted to the database, to avoid the risk of downtime and data loss, while a sound mission failed redo mechanism and detailed task tracking and alarm strategies.

Second, distributed task scheduling technology selection

2.1 Distributed scheduling considerations

sia-task- design

  • Task orchestration: the timing of tasks between the plurality of business processes order exists.
  • Task slice: For a large task to be fragmented in parallel.
  • Cross-platform: in addition to the project using the Java technology stack (SpringBoot, Spring, etc.), the application of the use of other languages ​​as well.
  • Non-invasive: Business does not want the high coupling scheduling, only concerned with the implementation of business logic.
  • Failover: task execution process encountered a problem and there are compensatory measures to reduce manual intervention.
  • High Availability: scheduling system itself must ensure high availability.
  • Real-time monitoring: real-time access execution state of the task.
  • Visualization: Operation Task Scheduler provides visual page, easy to use.
  • Dynamic editing: Task clock parameter of the service may change, do not want to stop the deployment.

2.2 SIA-TASK compared with other distributed task scheduling technology

SIA is appropriate letter to the company based on the development platform Simple is Awesome abbreviation, SIA-TASK (micro-task scheduling service platform) is one of the important products, SIA-TASK fit the current micro-services architecture model, with cross-platform layout, high available, non-invasive, consistency, asynchronous parallel, dynamic expansion, real-time monitoring and so on.

Open Source Address: https://github.com/siaorg/sia-task

Let's compare the mainstream open-source framework for distributed task scheduler on the market, analyze their strengths and weaknesses, and then introduce our technology selection.

  • Quartz: Quartz is an open source project open source OpenSymphony task scheduling in the field, based entirely on Java. The project in 2009 was acquired by Terracotta, Terracotta is currently owned by a project. Compared to regular tasks or JDK provided by Spring, Quartz control of individual tasks basically the extreme, with its powerful features and application flexibility, it played a huge role in enterprise applications. Quartz, however does not support the task orchestration (have dependencies between tasks), task and does not support fragmentation.
  • TBSchedule: TBSchedule is a framework for supporting distributed scheduling, task or tasks in batch and allows changing, are dynamically allocated to the plurality JVM hosts, different thread group are executed in parallel. Based on pure Java ZooKeeper realized by Alibaba revenue. TBSchedule focused on the distribution of tasks, support tasks fragmentation, but no task scheduling, nor is cross-platform.
  • Elastic-Job: Elastic-Job Dangdang open source is a distributed scheduling solution that consists of two independent sub Elastic-Job-Lite and Elastic-Job-Cloud components. Elastic-Job support tasks slicing (slice consistency jobs), but no task scheduling, nor is cross-platform.
  • Saturn: Saturn is the only product will be open source distributed, highly available scheduling services. Saturn in Elastic-Job to do secondary development, support for monitoring, task fragmentation, cross-platform, but no task scheduling.
  • Antares: Antares is distributed scheduling Quartz-based support fragmentation, support tree task dependencies, but not cross-platform.
  • Uncode-Schedule: Uncode-Schedule is based on the Zookeeper distributed task scheduling components. Supports all task is not repeated in the cluster, do not miss execution. Support dynamically add and delete tasks. But the task does not support fragmentation, no task scheduling, is not cross-platform.
  • XXL-JOB: XXL-JOB is a lightweight distributed task scheduling platform, its core design goal is to develop rapid, simple to learn, lightweight, easy to expand. XXL-JOB support fragmentation, simply rely on support tasks, sub-tasks depend on support, not cross-platform.

Here's a brief comparison with those under the SIA-TASK task scheduling framework:

  Task scheduling Task slice Cross-platform High Availability Failover real time monitoring
SIA TASK
Quartz × × .NET × API monitor
TBSchedule × ×
Elastic-Job × ×
Saturn ×
Antares ×
Uncode-Schedule × × ×
XXL-JOB Sub-task dependencies ×

Can be found, these scheduling framework basically support high availability, failover and real-time monitoring and other functions, but support for task scheduling, task fragmentation and cross-platform capabilities have focus. SIA-TASK will fully support these functions.

Three, SIA-TASK Introduction

3.1 SIA-TASK technology selection

sia-task-technology

  • REST: A software architectural style. Requirements actuator exposed Http call interface to achieve the purpose of cross-platform.
  • AOP: section programming techniques. In the Hunter Spring project extension package used to ensure that calls are serially Task (singleton single thread).
  • Quartz: powerful, flexible application, the control of a single task basically the extreme, used as a dispatch center clock components.
  • MySQL: metadata for storing (temporarily) access log.
  • Elastic: Lucene-based search server that offers a distributed multi-user capabilities of full-text search engine for storage and query logs.
  • SpringCloud: active community development framework, is also designated a unified development framework. For rapid development, fast iteration.
  • MyBatis: an excellent persistence framework, support for custom SQL, stored procedures and advanced mappings. Persistence layer for simplifying development.
  • Zookeeper: proven registry. To solve the dispatch center high-availability, distributed consistency and other issues.

3.2 SIA-TASK design ideas

SIA-TASK learn from micro-service design ideas, get on each task distributed actuator nodes (Task) metadata, reporting, upload registry. You can support the use of online editing tasks online scheduling, dynamic modification of the task clock; interactive use Http protocol as the transport protocol. Unified data exchange format using Json. Organizer user (hereinafter will be described) is operated, a triggering event, event scheduler receives, clocked parsed by dispatch center, process tasks, task notification.

3.3 SIA-TASK basic concepts

SIA-TASK using separate task scheduling and manner, tasks and logic operations are completely separated scheduling logic. System components involving the following core concepts:

  • Tasks (Task): basic execution unit, actuator exposed outside of an HTTP call interface.
  • Jobs (Job): the presence of one or more logical relationships to each other (serial / parallel) of the tasks, the smallest unit of scheduling tasks dispatch center.
  • Plan (Plan): the number of job execution order, each job has its own execution cycle, there is no plan execution cycle.
  • Mission control center (Scheduler): performs scheduling according to the execution period of each job, i.e., an HTTP request in accordance with the program logic, job task.
  • Task scheduling center (Config): choreography centers use task to create plans and operations.
  • Task executor (Executer): receiving a HTTP request execution of business logic.
  • Hunter: Spring project expansion pack, is responsible for the implementation of tasks capture, upload registration center, business can rely on this component be written Task.

3.4 SIA-TASK system architecture

SIA-TASK can be divided into three modules (control center, and the center of the actuator arrangement), the two components (persistent storage and registration center). The role of the three modules and the two components as follows:

  • Task Scheduling Center: Responsible seize Job and task scheduling and task migration, a SIA-TASK core modules.
  • Task scheduling Center: Responsible for online task logical layout, providing real-time monitoring and log viewing.
  • Task executor: responsible for receiving a scheduling request and perform tasks logic.
  • Task Registry (ZK): coordination of Job and Task, scheduler and other workflow.
  • Persistent storage (DB): Record Job and Task data items, and provide log storage.

SIA-TASK SpringBoot used as an architectural system selection, based Zookeeper Quartz and secondary development, support the corresponding Feature, SIA-TASK logic architecture is shown below:

Logic chart

3.5 SIA-TASK Module Description

3.5.1 mission control center

Mission control center responsible for task scheduling, manage scheduling information, scheduling request issued in accordance with the scheduling configuration, does not bear its own business code. Decoupling task scheduling system and improve system availability and stability while scheduling system performance is no longer limited by the task module; support visualization, simply and dynamically manage scheduling information, including task create, update, delete, and alarm and other tasks, All of the above operations will with immediate effect, while supporting the implementation and results of monitoring and dispatching a log, support the implementation of fault recovery.

3.5.2 task scheduling center

Task scheduling center dispatch center support online components of a distributed task model choreographed; relying on the web UI can end the task orchestration.

We can arrange some complex scheduling model by model basis of the above, for example:

Scheduling model

SIA-TASK layout of UI interface:

UI interface layout

See the end of the task orchestration arrangement information as shown below:

Schedule information

Meanwhile, the center also offers home layout view statistics, schedule monitoring, Job Management, Task Management and log management functions.

3.5.3 task executor

It is responsible for receiving a scheduling request and perform tasks logic. Focus on the task module to perform tasks such as operation, development and maintenance simpler and more efficient;

Actuator supports two types:

(1) If you use the sia-task-hunter, and project support SpringBoot Spring project, the introduction of sia-task-hunter, task (Task) grab the client. Compliance HTTP interface (called Task) task is automatically captured and uploaded registry;

(2) If no sia-task-hunter, only provides HTTP callable interface task, at this time requires manual input traffic, call control and self-control of the concurrent tasks.

3.5.4 Task Registry (Zookeeper)

Zookeeper distributed using the framework as a registration center.

Registry

(1) task registration

And execution control center to the clusters as Zookeeper registry, all data in the form of content nodes and the node registration, by periodically reporting remain viable in the host state Zookeeper.

(2) storing metadata

Registration center not only provides registration services, and the information stored for each actuator (including the actuator instance information, Task execution metadata upload, as well as some temporary status data task runtime).

(3) Event Publishing

Zookeeper event based push mechanism, publish task by task scheduler balancing algorithms to ensure a balanced distribution of preemption.

(4) load balancing

Acquires the number of guaranteed scheduling Job execution equalization, the pressure to avoid a single node.

3.5.5 persistent storage (DB)

Here the use of MySQL as data persistence solutions.

In addition Task dynamic metadata stored in a registry, other metadata are stored in MySQL, including but not limited to: Task manual entry of configuration information Job, Task orchestration dependent information, the schedule log, operation log service personnel , Task execution logs.

3.6 SIA-TASK key operating processes

3.6.1 Task publishing process

Task publishing process

(1) The user can create a Job by UI. Job can select the type of mailbox set up early warning, set the Job description. Then the task Task choreography created for the Job.

After (2) Job creation is complete and the relationship can be set up Task scheduling tasks release operation (activated once, stop and delete operations) for the corresponding Job through the UI.

(3) the user's Task task can be captured by the gripper, can also be created manually using the UI.

3.6.2 execution process

Implementation process

After (1) Job creation is complete, you can choose to activate the trigger timing task;

After (2) Job arrival time of booking, triggering dispatch center Job, and then performed by the actuator in accordance with the http notification predetermined Task Task orchestration logic, and the asynchronous listener task execution result;

(3) If successful execution results, it is determined whether there is a post-Task, if present, then continue to the next schedule, if not, it indicates that the Job is finished, the end of this call; if the results fail, triggering Recovery strategy: stop immediately, ignore this failure, several attempts, go to the other actuators to perform.

3.6.3 state transfer

Job memory throughout the life cycle of the four states, namely: Stopped (NULL), preparation (READY), runs (RUNNING), abnormal stop (STOP), the state of flow and flow conditions as shown below.

State transfer

3.7 SIA-TASK module designs

SIA-TASK physical network topology is as follows:

Network topology

Between SIA-TASK module interaction design ideas:

(1) created by orchestrating Center Task task or automatically grab by Hunter, and save the Task asynchronous information to DB; Job creation and activation, create JobKey in the zookeeper.

(2) control center will monitor the zookeeper in JobKey create an event, and then seize Job creation, to seize the addition of quartz regular after a successful mission, when the time arrives that is triggered Job run. Dispatch an asynchronous call center service execution Job execution of Task (there may be multiple Task, Task failure to follow the policy), and returns the result to the dispatch center.

(3) The Job execution status at any time on the zookeeper changes can be queried via the query interface layout center.

After (4) Job was doing the waiting.

3.7.1 task layout design center

Arrangement can perform data exchange with the central DB and zookeeper, which can be divided into three main functions:

  • Data persistence interface service;
  • Spring zookeeper data changes;
  • Data visualization: View systems of various statistical data.

Home monitoring arrangement is shown below:

Home monitoring

3.7.2 mission control center design

The main control center interacts with the DB, ZK and actuators, which can be divided into the following main functional areas:

  • Job execution logging
  • ZK in Job status changed
  • Call service execution Job execution
  • Availability dispatch center
  • Job scheduling thread pool

3.7.3 Task Execution Design

Actuators may interact with ZK dispatch center and its functions can be divided into two main aspects:

  • Receiving scheduling dispatch center, task execution timing, and returns the result to the control center;
  • Automatically grab Task task on the actuator, submitted to ZK.

Actuator Task example:

@OnlineTask(description = "在线任务示例",enableSerial=true)
@RequestMapping(value = "/example", method = { RequestMethod.POST }, produces = "application/json;charset=UTF-8")
@CrossOrigin(methods = { RequestMethod.POST }, origins = "*")
@ResponseBody
public String example(@RequestBody String json) {   
    /**
     * TODO:客户端业务逻辑处理
     */
    Map<String, String> info = new HashMap<String, String>();
    info.put("status", "success");
    info.put("result", "as you need");
    return JSONHelper.toString(info);
}

 

Thus, the task Task write very simple.

3.8 SIA-TASK high availability design

Distributed availability services in general must take into account program, the same SIA-TASK In order to ensure high availability, we carried out for different service components to enhance different dimensions.

High Availability 3.8.1 task scheduling center

SIA-TASK separated by front and rear ends, and other measures to achieve the split service availability center arrangement. After the failure of an instance in the cluster, the cluster does not affect other instances, there is no need to use special operations other arrangement may be used in the center of the cluster.

3.8.2 highly available mission control center

3.8.2.1 move unexpectedly

If the dispatch center an instance node in the cluster service is down, all Job will smooth migration to an instance of the cluster available on the instance node, execution will not cause loss of regular tasks at the same time, when an instance after a crash repair upon successful re-access the cluster will continue to seize the Job service.

3.8.2.2 Configure thread pool

Scheduling thread pool ways, avoid single-threaded task scheduling due to obstruction caused by the delay. Cheng pool number of threads, the default value is 10, when the concurrent execution of multiple tasks will be time-consuming tasks, to select the size of the thread pool based on business characteristics.

org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 60
org.quartz.threadPool.threadPriority = 5
org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread = true

 

SIA-TASK performed using the thread pool once again according to threadPool quartz itself provides. Be redefined thread pool, to assign a unique thread pool for each Job. The size of the thread pool size can be dynamically scalable according to the number of self-choreographed Task Job, Job scheduling to ensure that each thread is completely independent, not because of increased sharply the number of thread scheduling Task depleted resources. Recycling thread pool resources allocated for a period of time while providing thread pooling resources recycling logic, permanently terminated Job.

public static ExecutorService getExecutorService(String JobKey) {

    ExecutorService exec = executorPool.get(JobKey);
     if (exec == null) {
        LOGGER.info(Constants.LOG_PREFIX + "Initialize thread pool for running Jobs,Job is {}",JobKey);
      exec = Executors.newCachedThreadPool();
      executorPool.putIfAbsent(JobKey, exec);
      exec = executorPool.get(JobKey);
  }
    return exec;
}

 

3.8.2.3 Full log tracks

SIA-TASK conducted for the Job Scheduling entire life cycle of a comprehensive tracking, enhanced use AOP for logging, dispatch center every triggering Job Scheduling will be logged. Job Task execution while choreography will be recorded for the mission log.

Job logs into the log and Task log:

  • Job Log: including the scheduling information, scheduling time, scheduling status, and other additional properties.
  • Task log: execution information containing execution time, execution state, return information, and other additional properties.
3.8.2.4 Induction Package
  • SIA-TASK designed from the beginning to consider the task of remote call center scheduling of concurrent threads loss of resources. For remote Task Job scheduling package, all asynchronous call, each of the time-consuming task request logic very lightweight. Only just met the http request.
  • Support Task user to customize settings timeout, timeout supports two modes: connecttimeout, readtimeout. It allows users to set the time-out period based on the specific implementation of the business.
public interface RestTemplate {

/**
 * 异步Post方法 * @param request
 * @param responseType
 * @param uriVariables
 * @param <T>
 * @return
 */
 <T> ListenableFuture<ResponseEntity<T>> postAsyncForEntity(Request request, Class<T> responseType, Object... uriVariables); }

 

3.8.2.5 custom scheduler resource pool

The scheduler resource pool

SIA-TASK from the design point of physical resource scheduling resource pool, some special considerations for cases we were pooled for the scheduler; scheduler may make a transition state by the different operations, the ability to perform the conversion.

  • Job scheduler resource pool: management tasks with the ability to get and you can actually get a task scheduler resources.
  • Offline scheduler resource pool: Management has the ability to get the task scheduler but the actual inadmissibility of the acquisition of resources.
  • Offline scheduler resource pool: Manage offline scheduler resource pool has downtime scheduler resources.

3.8.3 task executor of high availability

  • Consider the network instability, SIA-TASK for network instability also made a very important design, test support for connectivity nodes and node running instance of premonition healthy for Task know, ahead of time to ensure the perception of Task instance node health conditions, scheduling Task ensure high availability.

  • But also to ensure the implementation of network problems for instance lead to broken links, SIA-TASK redesigned reconnection mechanism zookeeper to ensure that recovery can try again to run Task instance node link is lost due to network problems, and return to normal until after scheduling tasks executing the normal reception into the pool.

  • In general, the actuator is a cluster deployment. Task execution unit as if executed on a machine failure in the actuator in a cluster, the control center will fail to do according failover policy. Here are two failover policies: poll shift and the maximum compensation transfer. Polling transferred to the list of available actuators poll, if an actuator is successful, the Task is executed successfully, if all fails, then Task execution failed. The maximum compensation for the transfer is first performed several times again this actuator, if successfully implemented, will not be transferred, or if the execution fails, a polling transfer policy is executed.

IV Summary

At this point of micro-services task scheduling platform SIA-TASK made a brief introduction, including the background design, architecture, design, and product component functionality and features. Micro-service task scheduling platform SIA-TASK basically solved the current business needs, providing a simple and efficient scheduling scheduling service. SIA-TASK iteration will continue to provide better service. After also provide related technical documentation and use documents.

Links Guide

Open Source Address: https://github.com/siaorg/sia-task

Further Reading: letter should open micro-task scheduling service platform (SIA-TASK)

Author: Mao Masae / ALLEN / Liang Xin

Original starter: SpringCloud community

Guess you like

Origin www.cnblogs.com/yixinjishu/p/10972905.html