[Scheduling and Configuration Practice of Big Data Development Kit] - Dependency Configuration of Different Periodic Tasks

Read the full text http://click.aliyun.com/m/23308/

In the process of big data development, tasks of different running cycles are often encountered to be dependent. Common daily tasks depend on hourly tasks, and hourly tasks depend on minute tasks. So how to develop these two scenarios through the big data development kit?

This article will start from these two scenarios, combined with scheduling dependencies/parameters/scheduling execution, etc., to introduce the best operation practices for scheduling dependencies in different periods.

Before that, let's clarify a few concepts:

Business date: the date when the business data is generated, here refers to the business data of a complete day. In the big data development kit, the last complete day of business data that the task can process every day is the data of yesterday, so business date = daily scheduling date - 1 day.
Dependency: Dependency is a semantic connection relationship between two or more nodes/workflows, in which the running state of the upstream node/workflow can affect the running state of the downstream node/workflow, and vice versa.
Scheduling instance: When the scheduling system of the big data development kit schedules and executes periodic tasks, it will first instantiate according to the configuration of the task, and each instance will carry attributes such as specific timing time, status, and upstream and downstream dependencies.

Note: At present, the instances that are automatically scheduled by the Data Development Kit every day are generated at 23:30 last night.

Scheduling rules: The conditions to be met for scheduling tasks to run:

whether all upstream task instances run successfully. If all upstream task instances run successfully, the task is triggered to enter the waiting time state.
Whether the scheduled time of the task instance has expired. After the task instance enters the waiting time state, it will check whether its own timing time is up, and if the time is up, it will enter the waiting resource state;
whether the current scheduling resources are sufficient. After the task instance enters the waiting resource state, check whether the current scheduling resources of the project are sufficient. If sufficient, it can be run.
Daily tasks depend on hourly tasks
Business Scenario
System demand statistics are up to hourly business data increments, and then a task is required for a full day of aggregation after the last hour of data aggregation is completed.

Requirement analysis
1) Increment per hour, that is, the amount of data in the last hour of task statistics from every hour. You need to configure a task that is scheduled every hour on the hour. The data for the last hour of each day is counted on the first instance of the second day.

2) The final summary task is executed once a day, and it must be executed after the data statistics of the last hour of each day are completed, then a daily task needs to be configured, depending on the first instance of the hourly task.

The scheduling form obtained from the analysis is as follows:
Full text link http://click.aliyun.com/m/23308/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326183174&siteId=291194637