CreditEase micro-task scheduling service platform construction practice | Share Record

Content Source: Yixin Technology Institute of Technology 4 salon - live online | CreditEase micro-task scheduling service platform construction practice

Speaker: CreditEase senior architect in charge of the development platform & Liang Xin

Introduction: Today, whether it is Internet applications or enterprise applications are a huge number of batch jobs often require some scheduling system to help us solve the problem. With the gradual evolution of the micro-service architecture, and gradually evolved into a distributed architecture monomer, micro-services architecture.

In this context, many of the tasks before scheduling platform can not meet the needs of the business system, so there are some distributed task-based management platform. These platforms have their own characteristics, but also at their own shortcomings, such as not support task scheduling, coupled with high service does not support cross-platform and other issues, is not very consistent with the company's needs, so we developed a micro-task scheduling service platform ( SIA-TASK). The main share unfold around SIA platforms, including R & D background and technical architecture design ideas, as well as how to support the business side.

A, SIA-TASK produced 

1.1 Background

Whether or Internet applications, enterprise applications, are filled with a large number of batch jobs often require some scheduling system to help us solve the problem. With the gradual evolution of the micro-service architecture, and gradually evolved into a distributed architecture monomer, micro-services architecture.

In this context, many of the previous task scheduling platform or components can not meet the needs of the business system, so there are some distributed task-based management platform. These platforms have their own characteristics, but also have shortcomings, such as not support task scheduling, and business high coupling does not support cross-platform and so on.

1.2 species

According to the relationship between tasks and time we put into batch jobs into three categories, aircraft type, type subway, bus type.

  • Aircraft type refers missions per year / month / week / day is fixed at a time to perform. This task is very common in our business systems, such as 1:00 every day to run a batch job to be performed to clear logs from the previous day; 10th of each month to give companies full wages, these are aircraft-type tasks.
  • Metro-type refers to a fixed time every mission, not concurrent. We often encounter such a batch job, the first task is not over, the second task is not performed, which is not concurrent.
  • It refers to the bus type at fixed time tasks can be complicated. If it is a bus-type task, before a task is not completed, the next task can be started by point.

1.3 problem

CreditEase micro-task scheduling service platform construction practice | Share Record

You experience the following problems in the process of running batch task:

  • Forget, forget timing task is still running. One such case had occurred in our company, a winter several years ago, one of our project team with three months to do a project, after a period of operation of the project found that the effect is not very good, put the relevant have stopped the program, but forgot to run a batch job node continues to run until two years later, this node generates a log of the disk fills up, triggering alarm monitoring, we found.
  • Single-point, that is not hot standby, run a batch task is a timing task single point of operation, a transfer failure requires manual handling.
  • Dependent, to use the time difference data processing depends repeatedly cause problems. We all know that sometimes is the need for the project dependencies. An item such as batch process run A and run B exists priorities batch processes, the batch process the team A set to run at 2:00 run, run the batch run Scheme B 4:00, the time from the order to ensure, in case a run batch processes execution time is too long, more than two hours, it will cause problems with the data, we need to manually process the data problem.

1.4 Relations

Previously mentioned is a relationship between tasks, what relationship does it exist in the end? I think there are the following three:

  • Serial, there are two tasks have relations. Perform the task B. That is, after the task B after the execution of the task A, task A is executed first
  • Parallel, two tasks can execute concurrently. Tasks such as B and C should be executed after the task A, and after the completion of the implementation tasks A, B and C tasks can be performed simultaneously, that is, B and C parallel relationship.
  • Branch, according to the predecessor task returns the determination result, different results perform different subsequent task. For example returns 0 when performing tasks A, returned when the execution task B 1, which is a branch of the case.

1.5 Reflection

Based on several of these relations, we will consider the following two aspects at the time of the construction task scheduling platform:

  • Platform. The project team always want to put more energy into the development of business in the hope that the other has nothing to do with the business development team as much as possible into the framework. They want to have a platform to perform a task, just need to write a good business logic into this platform can be, and this platform will do all the work, the project team only need to care about business logic.
  • Micro service. In order to better meet the needs of the project, we hope to arrange dispatch area business logic and tasks can be separated from the task, using the registration and discovery mechanism to build task scheduling platform for business-related part of the project team to deal with, the other portions of the platform to the task to deal with.

1.6 Factors

In addition to the above two considerations, we also need to consider the following eight factors.

  • Task scheduling. Regular tasks among multiple business processes there is order, there is a parallel relationship between the aforementioned tasks, there is a serial relationship, and the relationship between branches, we hope to have the appropriate platform orchestration to handle these tasks and support .
  • Task fragmentation. For a large task to be fragmented in parallel.
  • Cross-platform. In addition to the project to use the Java technology stack (SpringBoot, Spring, etc.), but also be able to use the application to support other languages.
  • Non-invasive. Business does not want the high coupling scheduling, only concerned with the implementation of business logic, hope business platform for the code itself is non-invasive, and will minimize the impact.
  • High availability / failover. Scheduling system itself must ensure high availability, can not have a single point, the task execution process encountered a problem and there are remedies, smoothing can reduce manual intervention.
  • Visualization. Operation Task Scheduler provides visual page, easy to use.
  • real time monitoring. Platform to have real-time monitoring system, real-time access execution state of the task.
  • Dynamic editing. Task clock parameter of the service may change in the visualization based on the operation of all tasks performed are reflected in real time to business systems go, you do not need to stop the deployment.

Based on the above background and consideration, we have built a micro-task scheduling service platform SIA-Task.

Two, SIA-TASK core design ideas

2.1 Introduction

SIA is "Simple is Awesome" for short.

SIA-TASK (micro-task scheduling service platform) is one of the important products, SIA-Task fit the current micro-services architecture model, with cross-platform, orchestration, high availability, non-invasive, consistency, asynchronous parallel, dynamically expanding, real-time monitoring and so on.

SIA-TASK task scheduling is one solution, meta data acquisition task, then the task visual layout, the final task scheduling, and take the whole process monitoring tasks, easy to use. Business completely non-invasive, task scheduling model can be generated in line with expectations through simple and flexible configuration.

SIA-TASK reference design micro-services, access to distribution task metadata on each task executor, and uploaded to the task registry. Using online scheduling task, the task to dynamically modify the clock, as the task scheduling using HTTP protocol, with JSON unified data format clocked parsed by the control center, process tasks, task notification.

2.2 The term

SIA-TASK brief term.

  • Tasks (Task): basic execution unit, actuator exposed outside of an HTTP call interface;
  • Jobs (Job): the presence of one or more logical relationships to each other (serial / parallel) of the tasks, the smallest unit of scheduling tasks dispatch center;
  • Plan (Plan): the number of job execution order, each job has its own execution cycle, there is no plan implementation cycle;
  • Mission control center (Scheduler): according to the execution period of each job scheduling, i.e., an HTTP request in accordance with the program logic, job, task, it is a single node;
  • Task scheduling center (Config): choreography centers use task to create plans and operations;
  • Task executor (Executer): receiving a HTTP request execution of service logic;
  • Hunter: Spring project expansion pack, is responsible for the implementation of tasks capture, upload registration center, business can rely on this component be written Task.

Relations Job, Task, Plan of

Task is the basic unit of business execution, an HTTP call interface actuator outside exposure. Constituting a plurality of Task Job, and Job Plan is executed by a plurality of sequentially configured.

Why here you need a Plan? Sometimes two tasks are not only sequential relationship (B task is executed again after executing the task A), also need to meet certain requirements of time, such as 10 am to perform tasks A, B 14:00 mission, and must ensure am 10:00 A task execution is completed on time.

Figuratively, tonight 8:00 there is a football match live, if 20:00 I'm not home, then I see no way to live, but if I leave work early today, more than 18 o'clock on the home, but also have to wait until 8 point to begin to watch the game, which is the source of plan program.

2.3 Composition

CreditEase micro-task scheduling service platform construction practice | Share Record

SIA-TASK task management platform has the following components:

  • Task executor, is your business code where it is part of the project team.
  • Task registration center, we use ZooKeeper.
  • Task scheduling center
  • Persistent storage, we use MySQL.
  • Mission control center

2.4 Run

CreditEase micro-task scheduling service platform construction practice | Share Record

Next, the detailed operation of the SIA-TASK logic.

First, the report notes crawl task executor of the task to the task registry. Task executor at startup, there will be a called online Task annotations on the method so long as this comment into the control code, the HTTP interface will automatically crawl out, and then reported to the registry task, here we use is ZooKeeper.

Task scheduling center to obtain data from mission to save the registry orchestrate persistent storage. In other words, the equivalent of the actuator, the call to service HTTP request to an interface instance URL addresses and ports uploaded to crawl out ZooKeeper, the ZooKeeper got the one task information ZooKeeper task itself will crawl out into the MySQL inside.

Here we must distinguish what is the mission, what is the task instance. Examples of tasks and task relationships, a bit like the relationship between classes and objects, is a business logic code can be deployed on multiple nodes, the nodes that is business logic code is exactly the same, crawling in the operational phase when it will each node on the business logic code to crawl up, it is a business for this task, but every port, corresponding to each IP address is probably a task instance. For example, when high-availability hot standby, the information we will save the task itself after handling the persistent store, and the information will only stay in the instance itself ZooKeeper years.

Tasks can be configured according to ZooKeeper in the center of information and the information in MySQL configuration is based on crawling task to add these Task clock, strategies, and then arrange the Job Plan, and to present information to save in MySQL.

Mission control center from persistent storage for schedule information, know the layout of Job, Plan, clock, logical strategy, mission control center in accordance with the scheduling task execution logical access control, these actuators from crawling to the Task scheduling.

This is the SIA-TASK operating logic, and we will keep to the schedule log in Kafka.

2.5 Characteristics

1) Annotation-based tasks automatically crawl

Join @OnlineTask exposure to comment on the HTTP service approach, @ OnlineTask will automatically grab method where the IP address, port, path request, the request method, request information parameters formats uploaded to the task registry (zookeeper), and synchronize the task information is written to the persistent store.

2) Annotation-based non-invasive multiple threads of control

Single task instance must run single-threaded task scheduling framework to automatically block comment @OnlineTask single-threaded operation control, keeping a running task will not be scheduled again. And the whole process completely control the perception of developers.

Is on a task instance, to ensure that when running single-threaded task in the state. In fact, this is controlled by the users themselves, if need be single-threaded, there can be controlled; if need be multi-threaded, can be uncontrolled. This control does not require additional code that only needs to go in the annotation process.

3) Highly flexible task scheduling mode

SIA-TASK task is to design ideas atom, the plurality of tasks are combined to form a job (the Job) is performed according to the relationship. While running into the central task scheduling and task scheduling center, so that the composition of the jobs in the job scheduling and spaced apart of each other. When we need to adjust the flow of work, we only need to be processed at the center of choreography. Meanwhile choreography center support tasks in the serial, parallel, and other branches of organized relationship. Different tasks at the same task instance, also supports a variety of scheduling for processing, and the whole process choreography is done on a page, this feature is very easy to use, a highlight of which is the SIA-TASK platform.

4) the adaptive scheduler task allocation

Failure of task execution process occurs when abnormal, can multi-task based on customized policies to re-awaken the task to ensure uninterrupted mission. We set up a number of strategies, such as a Task problems how to do? It is once again wake up? Or matter? Or send alarm manual intervention? We customize a number of strategies to deal with these issues.

2.6 Key points

Understand the platform features, we have to sort out the key technical points of the SIA-TASK.

  • Task flow. Flow relationship may be achieved between the tasks and task configuration, formed directed acyclic graph (DAG). Task flow may be started timer time (Cron expression), or an external request (API provides addresses), perform a logic DAG.
  • Metadata management. Each task metadata management services in the micro-synchronous data capture, entry.
  • Intelligent operation and maintenance. Visualization of real-time monitoring task, all monitors are all pages can be seen; real-time early warning mechanism, something goes wrong, send e-mail or text messages to alert relevant personnel; semi-intelligent autonomous repair, sniffing retry, no manual intervention.
  • Resource isolation. Resource isolation between processes; resource isolation in the process, improve system throughput, provide stability. Clock with a Core Schedule, a dispatch center on a group project with a Core Schedule, each project team in the same scheduled time, on the same scheduler is isolated, a project set to go wrong, will not affect other project team, which represents the equivalent of isolation load balancing.
  • Load balancing. Dispatch center scheduled task, the task execution cycle time is not the same, it may be a bit longer required for some tasks, some tasks require time a little short, scheduler resources are not the same, some a little higher CPU, there is the CPU lower, how to ensure that the scheduling load balancing? How to ensure the resource load balancing isolated? We will consider this value based on the historical value of the task scheduling (time-consuming task) and performance of the machine itself, the number of each task scheduler dispatch center has almost consumed about the same. This is a new load, rather than a simple traffic load.

Three, SIA-TASK constituent modules

3.1 Home

Home task scheduling management mainly includes three parts: the scheduling information, scheduling times, docking project details.

  • Scheduling information: Number of the dispatch center scheduler.
  • Scheduling times: Dispatch Center dispatcher total cumulative history of Job.
  • Docking Project details: Total Team Control Center docking, the total number of Job.

Currently on the SIA-Task platform has access to the 51 projects, the number of Job run above there are more than 600 this year, on-line version, Job has been running for more than 30 million times.

There are several values ​​on the scheduler needs to know, each scheduler has three indicators.

  • Upper limit Job: Job dynamic threshold can load;
  • Run Job Number: The number of Job scheduler currently running;
  • Job warning value: when the number of Job Scheduler to run over the warning value, will send an email to notify the administrator.

3.2 scheduler management

CreditEase micro-task scheduling service platform construction practice | Share Record

There are a few information about the dispatcher needs to know, as shown, click on a scheduler (histogram), the scheduler will display the list seize the Job details:

  • JobKey: Job name configuration, each Job has its own name.
  • Type: Job configuration timing task types, divided into two categories Cron and fixRate.
  • Job type values: If Cron expression, 6 timestamp how to write; if it is fixRate, that is, how much time interval.
  • Warning Email: This Job configuration warning mailbox.
  • Description Information: Description of the Job function information, and allows the administrator to quickly find a Job Information station to seize the scheduler.

The scheduler includes a job scheduler, a scheduler offline, an offline scheduler, whitelist.

  • Work Scheduler: The scheduler has the ability of such preemption and scheduling of Job. Perform offline operations on a scheduler, it will immediately lose the ability to seize Job will automatically release after the implementation has seized the Job, and then be preempted other schedulers, the scheduler will enter offline offline scheduler list; job scheduler provides a list of offline and batch off the assembly line functions. In short, the work is being scheduler scheduler work.
  • Offline scheduler: This type of scheduler process is still alive, but lost the ability to seize Job scheduling and participation. Performing on-line operation of such a scheduler, the scheduler will enter the working list, and has the ability to start preemption and scheduling of Job; offline scheduler function provides a list of on-line and batch-line. That is, the offline scheduler is still alive, but no longer involved seize Job, Job still some previously will continue to complete, if you click on the line again have the ability to seize Job, it becomes work scheduler.
  • Offline scheduler: This type of scheduler process is no longer alive, after death off the assembly line process scheduler, the scheduler will automatically enter the off-line list, after such a scheduler process is restarted, it will automatically go into offline scheduler list; off-line scheduler the list also provides bulk delete and delete functions. Offline scheduler general problems are there, the process may be hung up, it could be a network failure.
  • White List: After the whitelist an IP, it has to call all execution instances of authority; white list provides bulk delete function, automatically lose the permissions to delete the IP.

3.3 dispatch monitoring

CreditEase micro-task scheduling service platform construction practice | Share Record

Is shown on FIG scheduling SIA-TASK-monitoring-page, the sub-region belongs to a different one of a project group. Currently SIA-Task access to the 51 projects, there are more than 500 in the preparation, running with 25.

Some Job execution is very fast, a few seconds to execute over, some Job execution is very slow, it takes a long time, when we captured in the state, only to grab a long time Job, these are captured Job display to be running short of time and not catch, but they are being executed, which is not being crawled to the Job is displayed as a preparation.

Some Job may not need to run this time, you can manually stop the rest is abnormally stopped Job, need to send e-mail alerts.

We also offer the ability to search, can accept different team Login for their project running.

3.4 Task Management

Task management interface, Task grouped according to the project team, the main provider of Task configuration, modify and delete functions. Task consists of two parts: Task use sia-Task-hunter components, automatically grab Task achieved by standard annotation, this type of Task can not be modified; another part of the Task is added manually by the user, and I know the URL and HTTP access address, manually add in, this part of the Task crawling cross-platform support, and can be modified or deleted.

A Task Management consists of the following parts: project name, application name, the task name, machine address, description, and view / modify / connectivity testing and other operations. Task same name, different machine address, represents a different task and task instances.

3.5 Job Management

CreditEase micro-task scheduling service platform construction practice | Share Record

Introduced in front of a Job Task composed by a number, the figure for each column represents a different project name, click the drop-down list to display all of the items, you can filter, add, view the status of other operations.

Where the status of the operation can be performed manually, you can stop or activate Job, after Job configured belong to inactive status, you need to activate it. You can also modify the information in Job, Job configuration and so on.
How to add a Job? If I want to add a type of expression Cron Job, what you need to add it?

Cron Job is because of the type of expression, first of all I need to enter a six-expression content, but also add a warning email, then the Job Description, Job Each has a key, the last also need to add Job_key. Such a new Job is added as well.

Looking back, add Job Task information you need to configure, this is a more complex process. A plurality of Task Job composition, we can determine the drag pull manner Task formed consisting of all sequential relationship based on the relationship between the Job Task. May also represent different items in different colors to distinguish, of course, only the administrator is entitled to see all the projects, the person in charge of each project can only see their own state-owned projects.

Upload Task time will bring some parameters, it also relates to the processing parameters, such as the type of parameters, parameter values, expiration time. Focus on talk time expired.

Encounter a problem through HTTP call: in the end what time will execute Task completed. To solve this problem, you need to set up a Task expiration time, as long as the expiration time comes, will be transferred to other strategies, such as abandoned or manual processing. Because as an asynchronous call, can not wait forever for the client to return results.

Of course, there may also be a situation: the result I get is overtime, in fact, the task is performed correctly, but after a period of time for me to return the results. We have designed a queue compensation mechanism to deal with this problem, but it seems of little significance. Of course, this is only possible on-line platform, has not appeared.

Currently Task_ platform selection strategy includes two examples:

  • Random from the list of optional randomly selected examples, i.e. IP + port;
  • Fixed IP, specified instance, and then need to manually specify the instance from a selectable list.
    Platform supports four Task_ call failed strategy:
  • STOP, stop strategy fails then the entire Job stops and does not execute the subsequent Task;
  • IGNORE, ignoring the policy, the call fails skip Task, continue to implement the follow-up Task;
  • TRANSFER, other examples of the transfer policy, select the Task execution, if still fails, use stop strategy;
  • MULTI_CALLS_TRANSFER, multiple calls and then transfer strategies, repeated calls to the Task several times, if still fails, the staging policies.

3.6 Scheduling Log

Log Management provides a running log Job information is displayed by the project team grouping, a key element of Job logs contain:

  • Execution state: indicates that the Job execution result;
  • Time: indicates the time Job scheduler schedules;
  • Execution completion time: Indicates the Job execution time to complete;
  • The scheduling information: Requests Job scheduler instance;
  • Execution information: Specific information Job execution, and has achieved execution log information associated with the referenced Job Task The log is saved by default seven days.

Fourth, open source

SIA-TASK team as an important product of the SIA, the company access to dozens of projects, running hundreds of Job, withstood the test of stability.

SIA-TASK Micro service scheduling platform in May has been open source, open source Address: https://github.com/siaorg/sia-Task, interested students can log in to view details .

Shared by: Liang Xin

Source: CreditEase Institute of Technology

Guess you like

Origin blog.51cto.com/14159827/2444455