Data Platform Scheduling System

  The first article, let's talk about one of the core components of large data development platform: job scheduling system.

  Job scheduling system is a relatively complex system, covering the complex, for a variety of scenes, programs vary widely realized, is a need for both theory and practice of the system.

This article focuses on talking about theory, we will start a big scene division point of view of various scheduling systems on the market classification discussion, and then for a specific job scheduling system, to explore the architecture of schools and their implementation, and briefly analyze each advantages and disadvantages. What we do want to make the job scheduling system, how to do, there is a general understanding

Those who dispatch system

Scheduling system, more specifically:

  Job scheduling system (Job Scheduler)

  Workflow scheduling systems (workflow Scheduler)

In addition Crontab, Quartz timing of such partial single scheduler / library. Distributed open source job scheduling system also has a lot more well-known example: oozie, azkaban, chronos, zeus, etc. In addition, including Ali TBSchedule, SchedulerX, Tencent Lhotse, Dangdang elastic-job, the only product of Saturn, etc.

We can say that almost every little scale data platform team, will have their own scheduling system implementation, or else self-development, or else some reform package and open source basis (such as many companies adopted a package oozie way ).

  The difference between job scheduling system and resource scheduling system (or cluster scheduling system)

  The latter typical example: Yarn / Mesos / Omega / Borg, as well as Ali's Fuxi, Tencent Gaia (Gaia), Baidu in Normandy and so on.

Resource scheduling system, its focus is on the underlying physical resource allocation management, the goal is to maximize the use of cluster machine CPU / disk / network hardware resources, and is often associated with the deployment of business logic is not directly related to the general process of program process such objects.

  Job scheduling system, the primary focus is to start right job at the right point in time, ensure timely execution and accurate job in the correct dependencies. Utilization of resources is not usually the first points of concern, the correctness of the business process is the most important.

  Job scheduling system sometimes consider load balancing issues, but ensure that the load is balanced more to the robustness of the system itself, and the rational use of resources, as a point can be optimized, often relying on the underlying resource scheduling system to achieve.

  So why the market there will be so many job scheduling system project, why did not like the job scheduling system components Hdfs / Hive / HBase like to form a relatively standardized solution? In the final analysis, by the business or job scheduling system of the complexity of the decision.

  A mature easy to use, easy to manage and maintain job scheduling system, and requires a large number of peripheral components docking , include not only storing the calculated frame, or may be used to process comprising: a blood management, access control , load flow control, monitoring alarm, quality analysis , and other services or transactions.

  Various open source job scheduling system on the market, or else in some areas / features are missing, use and operation and maintenance costs are high, a large number of secondary development; or else only for specific business scenarios, simple form, lack of flexibility; or in some functional link is closed self-contained, it is difficult to interface with external systems.

  So, the ideal, complete job scheduling system, in the end what matters should deal with it? To make matters what are these difficulties to overcome, what programs to choose from, an open source project scheduling system available in the market is how to deal with these issues, let me and we work together to explore.

 

Two types of job scheduling system

First of all, let me come to look at the big job scheduling system of classification. Job scheduling system on the market, depending on the actual function of positioning, direction mainly divided into two categories:

  1. Timing slice based job scheduling system

  2.DAG workflow-based job scheduling system

Structure and function of these two types of system implementations typically there is a big difference. So below do some simple comparison of the two.

 

A timing-based job scheduling system fragmentation

  The timing points of the direction of the sheet-like systems, positioned in the focus slice execution of the task scenario, representative of such systems include: TBSchedule, SchedulerX, Elastic-job, Saturn. Our self-development Vacuum is such a system.

  This function positioning job scheduling system, and its earliest sources need to be a starting point is often distributed Crontab / Quartz.

  The beginning of each business side of the Eight Immortals, play their own stand-alone timed task, then, as the business grows, more and more various timing tasks, increasing the cost of decentralized management. Add some traffic as data volumes grow, in order to improve operational efficiency, but also requires a distributed manner execute concurrently on multiple machines. At this time, a distributed scheduling system fragmentation also born out of.

  The actual application scenarios such systems, and often routine maintenance work or business logic needs regular implementation have some relevance. Such as the need to clean up a number of timed batch machine disk space, you need to schedule a batch list of goods, the need for regular batch to batch of data to build the index, you need to regularly send push notifications to a group of users, and so on.

 

The core objective of such systems is basically two things:

  • Support for slicing logic operations: a large task to split into multiple smaller tasks assigned to a different server to perform the challenge is to do not leak, not heavy, ensure that the load balancing, automatic migration task node crashes

  • Availability of precise timing trigger requirements: since it is often related to the timeliness and accuracy of the actual business processes, it is usually necessary to ensure a strong and reliable real-time tasks triggered

 

So, load balancing, elastic expansion, synchronization and failover status is usually characteristic of this type of scheduling system architecture design key consideration

From the access scheme and process, because the logic to support fragmentation, and the like to support failover, such scheduling system, the scheduling of tasks are usually intrusive requirement.

It is common practice to rely on the user's job scheduling system fragmentation associated client libraries, expand a job scheduling class as the entry point of triggering job. Generally it needs to receive and process information for the data slice correct fragmentation process. In addition usually also need to implement a number of interfaces to meet the management needs of the service side, such as registration node, register a job, start a task, the task is stopped reporting status and so on. Some systems also require resident Daemon job execution process node to coordinate local operations management and communications.

From the perspective of the trigger logic implementation, in order in the case of massive task, ensure strict precise timing trigger, one half of such scheduling system, the timing of which trigger logic is actually performed by the self node is triggered locally, i.e. required job or daemons running, job registered to the server, the server assignment fragmentation information and timing logic to the client, but the timing of the trigger, is to actually performed by the client library functions package according Quartz peer timing logic triggered.

Of course, the primary purpose of doing so is to ensure the accuracy and efficiency of the trigger, reduce server load, in addition to hang a short time if the server, as long as the job remains the same configuration, operation or trigger in the normal client.

Also some systems, such as SchedulerX, is the use of server-side trigger logic. This requires the service side of it a lot higher, because this time, the server not only to coordinate the fragmented logic, but also to maintain the trigger queue. Therefore, the system triggers the server, you first need to ensure high availability of the server, followed by even guarantee performance, therefore, usually using a cluster scheme

DAG-based scheduling system workflow

  The key component of the direction of this type of system, with a focus on the task of locating the correct handling of scheduling dependencies, system core logic is not usually performed by fragmentation of attention, not the system or the core processes, if some tasks really concerned about the fragmentation logic, often to the rear end of the cluster (such as MR slice task carrying capacity) or the rear end of a particular type of task execution to achieve.

  On behalf of such systems, including: oozie, azkaban, chronos, zeus, Lhotse, there are those defined visual workflow systems large and small public cloud service delivery. Our self-development Jarvis scheduling system also belong to this type of system.

  There are many job classes DAG workflow scheduling systems and services, flow dependencies between jobs more complex scenarios, such as a large number of data offline development platform cartridge traffic report processing, data acquisition from the cleaning, to report all levels of summary operations, the final data to the outside service system, a complete business process may involve hundreds of associated interdigitated dependent operations.

So the focus DAG workflow class scheduling system concerned, usually include:

Rich and flexible enough to rely on the trigger mechanism: such as time-triggered task, task-dependent trigger, trigger the task mixed

  • The trigger itself dependent, may have to consider, and more pro-reliance, the length of the cyclin-dependent (such as task dependencies hour day job, or vice versa), depending on the scope of judgment (such as the last successful mission depends on it to trigger the downstream, or the past week All tasks can successfully trigger downstream), its own historical task dependencies, serial and parallel trigger mechanism, etc.

 

Planned job changes and the implementation of water management and synchronization

  • This is a timing slice based scheduling system should of course be considered, but it is usually relatively simple.

  • In the DAG workflow class scheduling system, because of the complexity of the dependencies of flexibility and job tasks trigger mechanism, this requirement is particularly important, need to provide more complex functions, specific problem solving it is also a lot of difficulties.

 

The task priority management, service isolation, rights management, etc.

  • In the timing slice based scheduling system, under normal circumstances, the specific implementation of the business end of the isolation in many cases the natural, will register a specific service node to perform a specific task. Then, with business links shorter than normal, and strong real-time requirements, the management usually requires priority is not high, basically rely on resources to achieve isolation of resources available, there is less competition for resources issues, rights management also empathy.

  • In the DAG workflow class scheduling system, often a large number of shared resources to perform the job, so the priority, load isolation, and access control list problems will stand out

     

Processing a variety of special processes, such as suspension of tasks, heavy brush historical data, manual annotation fails successful, collaborative / temporary assignments and periodic tasks, and so on

  • Such requirements, but also because the complexity of the nature of the business processes brought matter such as business logic change, the script it wrong, there is a problem it upstream data, a downstream system to die and the like, and the correlation between the web service , leading factors to be considered when dealing with many problems, the means of processing also requires strong enough to be flexible

 

Comprehensive monitoring alarm notification mechanism

    • The simplest example, the task fails alarm, time-out alarm, further, the traffic load monitoring, business progress monitoring and prediction, if done for further improvement that can also include business health of monitoring and analysis, performance optimization suggestions and problem diagnosis expert system

 

 

summary

It should be noted that targeting these two types of systems, the conflict is not absolute contradiction. But, at the same time to complete these two categories of support needs, from the implementation point of view is very difficult because of different emphases, how many would choose to do on certain aspects of the architecture, the current systems are not able to It is perfect both. But that does not mean they will necessarily irreconcilable, like off-line batch computing framework and streaming real-time computing framework, a long time every one in his way, however, with the progress of theory and practice, but also began to rely on a framework for unified treatment the possibility of emergence of two types of business computing.

Heart two kinds of genres DAG workflow scheduling system

  DAG workflow scheduling systems have many open source implementation of major companies also often have their own system implementation. These systems, from development language that supports the type of task scheduling, alarm monitoring, access to business manner, completeness peripheral management tools such as point of view, are vastly different. These differences are largely differences in product form moves, but there is one, is more than simple moves, I think the difference is the Heart of the genre, it affects the core framework of a system design to a large extent.

The difference is this: to perform specific tasks, is relying on a list or a list dynamically executed on the implementation of static , what say? But wait, wait for me to explain it in detail.

 

Two concepts: work plan (Job Plan) and task instances (Task Instance)

To talk about the implementation of the list of issues Heart genre, first of all, you have to clear two concepts work plan and task instances

Typically, since you put a job scheduling system throw up regulatory enforcement, and that in addition to the one-off jobs that are to be repeated periodically, in most cases, but some are driven by pure time, there are some Front task dependencies to deal with.

So when executing this operation is performed once at the end of each month, 2:00 or executed once a day, or still between 9 am and 6 pm, performed once every hour? Task trigger condition is predecessor was successful in all or any of the pre-mission success, or requirements on its own once the results have success? Answer these questions, the so-called jobs plan (Plan)

The concrete fell on a certain period of one day, when a real job execution time? This is a specific implementation, the so-called task execution instance (Instance)

 

Static perform dynamic execution list vs list

Back to continue the discussion, the implementation of the so-called static list of genre, it is a concrete example of the implementation of said job is to work with the program, according to the calculation in advance and generate an execution list and scheduling system according to this list generated in advance to perform the mission. Furthermore, some systems actually are they not distinguish between work plan and perform the task list, the two are one and two for the artificial definition of a dependent and has determined the relationship between the task list, regularly perform this entire list.

For the system actually has a sub-jobs plan and execute a list of the common practice is an example of a job in the night close to the morning, when analyzing the dependencies between the time required for all jobs and job, and then generate the next day all you need to do list, the execution time of each particular task instance and interdependencies curing down. Scheduling system tasks, traverse checking this list to meet the conditions to trigger the execution of the task.

oozie, workflow services on azkaban and most public clouds, and the first generation of Jarvis Our scheduling system, basically belong to this genre. Which oozie, azkaban basically uses a list of jobs to plan and execute one program, Jarvis Our generation is used in two separate, program execution list generated by the work program on a regular basis.

Then, the so-called dynamic execution list of genre, is said to perform specific instance of a job, and no cure calculated in advance, but in its upstream tasks (pure time period, then the task is the task of the previous cycle) is finished, the dynamically calculated based on the time point when the latest job scheduling and dependencies.

Zeus, Chronos and our second generation Jarvis scheduling system basically belong to the genre.

The two schools is no absolute good or bad, advantages and disadvantages, each with its own good at processing the scene and not good at handling the scene, so sometimes the system does not achieve concrete finish is absolutely mutually exclusive, in some there is also a function of changes in trade-offs.

So why are there two schools? When generating a list of advance execution still need to re-generate, what to do with it? What are the two schools and their main difficulty is the problem?

Source of evil

The reason why there will be two schools, the source of the problem is that the planning and execution of the job instance list, different objects both services.

From the perspective of the periodic operations management, you should of course be an object-oriented work program, when you want to change the execution policy a periodic job, you modify the plan is to perform the job itself. And as scheduling system, during the execution of a specific task, it is an object-oriented implementation of the first instance of a task, rather than the plan itself.

So when planning to produce change, to relate to synchronization problems between planning and execution of the job list.

For a list of schools static execution, the execution process of determining the list of tasks is its long-term, because once the execution list is generated, then you can perform any modifications to it, various Hack, no longer need to consider the original job Schedule dependencies. For example, you want to skip today temporarily perform part of the task, directly remove them from the list of instances and removes the dependency from the dependency list downstream tasks just fine.

For a list of schools performed dynamically, this temporary Hack action will be more difficult to handle, and because the program is based on a strong example of association rules. To shade task instances today, the work may have to modify the plan, and modify the plan, such as follow-up to tomorrow's task instance may also be affected.

However, for instance, or to perform a specific example of dependency is difficult to determine in advance and generated scenes, such as dependence ranging period (such as at the end of the month mission depends on the daily tasks of day) or any success criteria to trigger, trigger number of instances uncertainty, it is almost impossible to generate a static list grams of execution in advance.

Another example is in some short period Task Scheduler to change or adjust task dependencies, task list has been the case part of the task is finished, the static program execution list can quickly and correctly perform the update list, you will face a great challenge.

As another example, all of the tasks of the day, modify some tasks are temporary, modify some tasks is long-term, in this case, a static program execution list because of how to deal with it? For a list of integrated planning and execution system, it is almost impossible to do, can only generate a temporary copy and then perform list, but do not treat. For a list of system timing generating an execution plan from the list that is bound to need some changes have been instantiated list of tasks to perform, some changes uninitialized work plan, in this mode, how to ensure that both sides of the modifications do not conflict, if conflict to who the main, or even whether a conflict is found, often very difficult.

 

summary

Therefore, the conclusion is simple, the implementation of a static list of programs, good at dealing with a determined time interval (preferably a current cycle), a known, one-time tasks change, the premise is that you list have a clear understanding of how to perform Hack. Dynamic Execution good at dealing with a list of programs not yet occurred, long-term plan changes for periods ranging from short-cycle tasks and timeliness of change will be much better, a temporary one-time changes you need to complete to assist in other ways.

Of course, these two schools they are not good for the scene, how many can find a variety of remedies to deal with, not entirely helpless, just remedy the problem of complexity and expense of size.

These two schools, we have practiced, the overall view, static list of program execution, the system architecture is relatively simple, the system is relatively straightforward logical, easy to analyze the problem, but the scene can be handled relatively limited. Wider dynamic execution list of options that can cover the range of scenarios, plans more responsive to change, but the system architecture to achieve a relatively complex operation logic and more complex, many factors involved, sometimes not easy to straighten out a logic.

So, if it is relatively simple business scenarios, dependent on the next task is easy to sort out the scene, a static list of program execution system maintenance costs will be relatively small, on the contrary, you should consider building a list of system dynamic execution program.

Finally, these two programs are not entirely mutually exclusive, jarvis Our second generation dispatch system, use the idea of ​​a static list of execution in some local functions to assist deal with the problem of dynamic execution list program more difficult to deal with. For example, a user needs to know today what the tasks to be executed, when implemented, it will need an instance of the list to perform, we can not tell the users of our mission is to dynamically instantiated, so the task has not been executed not to show it :)

 

Workflow scheduling system features and requirements analysis

  After talking about the Heart come to talk about the moves, regardless of genre, and finally implement the system implementation, from the perspective of the system, have to consider what features can improve stability, reduce management and maintenance costs, from a user perspective, the concern is capable of what features can improve efficiency and reduce development costs.

Define how the workflow

Since it is a question of workflow scheduling system, the user must first face, of course, is how to define and manage workflows.

Static explicitly define and manage workflows

  Most schools still run list system, such as oozie, azkaban and a variety of public cloud service workflow, workflow Flow will include creation of such a process, you need to define a specific workflow inside which jobs are included, they have dependencies how is it. The difference is that the user to define and describe the workflow by what means:

  Oozie requires users to provide such an XML file (to be submitted through the API), according to a predetermined format description topology logic dependencies of each of the workflow and Job, details of the configuration of various types of tasks and so on.

  User defined in claim Azkaban .job file describing the dependencies of the job, then the job is not dependent on its downstream relationship for each job creating a workflow, if the nested sub-workflows, the need to explicitly create and affirmed. Then all will depend on the .job file and job execution needs a zip package via the server upload and ultimately create a workflow and presented to the user on the server.

  Both ways oozie and Azkaban taken from the perspective of system design, the system of external relationships and dependencies is relatively small, is a relatively closed environment independent, free up more evolution. But the biggest problem the two systems is that around the operation and maintenance tools too lacking, ease of use is poor. It can be used as a tool, but as a service platform, missing too much content, workflow definition and maintenance costs are too high. There are so many companies that submitted manage a secondary development package, to reduce the difficulty of using the workflow on the basis of Azkaban oozie and on.

  And various public cloud workflow service, the drag operation is mostly node through graphically, allow the user to explicitly define workflows, there is no much difference in nature and oozie manner except to visually manipulate the shield configuration syntax details, reduce the difficulty of the workflow definition.

 

Dynamic Implicit define and manage workflows

  Chronos, Zeus and our scheduling system of two generations of Jarvis, is to go the other way. The system did not allow the user to explicitly define a workflow. In fact, the management dimension of these systems are operating, is a user-defined dependencies between jobs, what jobs constitute a workflow, the system does not actually care about, users do not need to affirm that the system is only responsible by the rules will satisfy all task scheduling up conditions, the delineation of a number of tasks into a Flow this behavior, the system is not required. You can even understand the entire system of all jobs is a Multiple Input Multiple Output Flow large concurrent execution.

 

Compared

  You have to ask permission that such isolation, these scheduling configuration, there is no concept of how to deal with the Flow? In fact, the implementation of these concepts and link a set of tasks of this concept are they not necessarily related. Flow concern is dependence, the concepts mentioned above other concern is control of resources, both of which involves objects can overlap, but not necessarily overlapping, sometimes not suitable overlap.

That two approaches what advantages and disadvantages it?

Flow explicitly defined workflow this way, the advantage is that users know exactly which tasks are a set of suitable processing workflow within small-scale operations, operations workflow between cross-reliance is not without frequent changes of scene, the user may control the sense of strong, but on the contrary, large-scale operations, associated with complex, frequent changes of scene is actually not suitable, in addition to support for dependent and trigger logic, the limitations are large (explain in the next section)

Way without explicitly defined workflow, users do not need care and manual handling workflow definition and concept, the use of flexible, depending on the change between jobs, business restructuring and other acts, will be automatically reflected in the overall task execution process, for the user , the less stress management, work process changes simple. The relative lack of group work place is no concept of Flow to undertake, manage resources needed to achieve by other means.

Job run cycle management

Explicit statically defined system workflow management job running cycle, usually to define and manage the entire work-flow basis. When the scheduled time arrives, the entire workflow start, the inside of the job are executed in the workflow dependencies. So if there is a working internal flow of work needs to be scheduled at different periods, will be difficult to handle, need to find a variety of remedies to avoid indirect, such as split workflow, create sub-workflows, or multiple copies of a job and so on.

Non-explicit system dynamically defined workflow management job running cycle, usually in units of a single job (because they've not fixed Flow This unit can manage), so users only need to run their own definition of demand jobs cycle just fine. In contrast, on the scheduling system developers, difficulty of implementation will be relatively large, because the correct trigger automatically determine the dependence of the logic would be more complex.

Manage dependencies jobs

  Before getting dependency management, we take a look at the conditions normally used to determine whether specific tasks of a job instance can start running what are?

  First, of course, is time-dependent , triggering a large number of regular tasks, rely on time to determine whether the operating conditions are met. Second task dependencies, according to the need of implementation of the predecessor, the current task to decide whether the operating conditions are met.

  Under normal circumstances, these two dependencies constitute the majority scheduling system to start running the core task of judging basis. But sometimes there is a third dependencies, that is data-dependent, by judging whether to run the task by relying on the data exists to determine whether to start the task.

  In theory, if the operating conditions and results of all tasks of data generated can be controlled or aware scheduling system, then rely on this data dependency is a false proposition, neither necessary, nor is often the best solution.

  Why do you say that? Because the data in terms of mean dependence on the scheduling system, the business logic is no longer transparent, on the one hand, you need to know the way to get the data, on the other hand, how to determine a task dependent on the adequacy of the data itself is not a an easy task, fault tolerance is often not good.

  For example, you are judged by whether the file exists, then the content of the document is wrong? Or task generates files ran half failed, the document is not complete it? Even if you can guarantee the correctness and atomic file, then re-run if the upstream task to refresh the data it? How do you determine this?

  Overall, personally think that a scheduling system, if this dependency rely on data relying on the more, the more it is relatively immature and complete the entire system, the more likely the need for manual intervention. Of course, everything is absolutely free, there are a number of scenarios using data-dependent is the most reasonable and effective solutions. And, to say the least, and then complete scheduling system, there is also a system boundary, the total will inevitably be some dependencies determination is achieved by external data.

Back to continue the discussion management job dependencies

  For the system employing artificial explicitly defined workflow, the job management dependent, to a large extent, in fact, through the management of a topology logic to implement the workflow, the user changes the topology logic workflow process, the actual It will change the dependencies between jobs. Job and the task dependencies, its borders, is substantially in the range of the current workflow class.

For systems not explicitly defined in the workflow, the user directly dependent management operations, such systems it is usually arranged upstream task to the user interface and triggers / interface. By changing the user dependencies between jobs, indirect effects associated with running processes topology logic operations

  So, you ask, these two definitions, and manage jobs dependent manner, it looks like just a different point of view, a new name, which kind of, just need the process, the actual effect is no different, right? In fact, not entirely so, for example, from the user's point of view:

  In front of a way to manage, it requires the user to work on the internal workflow better understanding of the current workflow logic topology also clear enough to better ensure that new job arrangements placed in the correct position up. But as long as dependent on job position node arrangements can have greater freedom.

  As a simple example, two jobs are dependent on such BC job A, no other interdependencies. So, I can work in parallel on the back BC A job may also be used to control resource, so BC job is placed behind the series A (corresponding to job C to manually dependent job B).

  The latter management, users only need to be concerned about the current job tasks, which can be, thus requiring the user to reduce a lot. However, such a topology logic to FIG sole and automatically generated (Example ABC on the job, the job can only be BC A parallel operation on the back), the disadvantage is that you can not work flow freely adjusted, but in fact you probably not adjusted necessary. If the job priority is to control, it can be implemented in other ways

  The latter also has a big advantage is that if your task dependencies can be automatically analyzed out of it (such as hive task by parsing the script can automatically determine the upstream and downstream data table, and then automatically find the related upstream and downstream through the data table task), then in most cases, users even can not configure the job dependencies, directly add specific job to get away, topological relationship to the entire workflow system automatically analyzes, add and adjust. For example, our scheduling system Jarvis, combined Hive metadata lineage analysis tool, almost reach this effect;)

  Finally, the user explicitly defined workflow this mode of inter-dependent tasks workflow is difficult to deal with, originally an internal workflow control can be achieved through task-dependent, after a cross workflows, usually only through data-dependent means to assist achieved, but as previously described, to do so, a user may need to rely on custom data detection logic, proper execution scene two to re-run it in the face of data class, the task would require further manual intervention to deal with.

 

Abnormal operations management and system monitoring

Often in the river walk, how can we avoid wet shoes, running more than a job, there is always a problem. So for users, good or bad jobs abnormal process management capabilities, workflow scheduling system is also an important consideration in handy.

For example, if a script logic intermediate task is wrong, need to re-run itself and the subsequent downstream tasks, how to deal with? Users finish the work by what way? Need to manually re-create a new workflow? Or may be achieved by checking the job, automatically find the way downstream tasks?

For example, a task fails, but the resulting data were restored by other means, then how to skip the task continues to run follow-up mission?

As another example, if the mission fails to automatically retry? Retry have any preconditions, what needs to be done preprocessing, mission failure should call the police, to whom the police? In what way the police? Stop Alarm under what circumstances? Task run slower or not the police? You know how slower than before? How slow the alarm? Different tasks can be treated differently? and many more

All these aspects determine the useful / practical experience and ease of use of the system's users, while the overall design process framework of the system may also have some effect.

Open source workflow scheduling systems in these areas are usually relatively simple to do, which is a lot of secondary development company engineered the focus direction.

 

Resources and access control

Some places have lakes. Multi-task, and is bound on the need for resources and authority control.

The most immediate question is, if there are a lot of tasks to meet the operating conditions, with limited resources, start and what? How to define and manage task priorities?

Further still, how do you know which resources to the bottleneck? If the scheduling system administration tasks to perform many types of tasks can be run on a different machine or cluster, how do you determine the amount of resources which tasks need? What insufficient machine or cluster resource? Can according to different criteria distinguish management, concurrency control classification or priority?

Finally, who can edit, run, manage a task? How user roles defined? And peripheral systems, such as Hadoop / Hive rights management system with docking system how?

These aspects, the majority of open source workflow scheduling systems do not improve, or difficult to achieve universal, because a lot of features and peripheral systems required depth with only likely to be completed.

 

System operation and maintenance capabilities

Operation and maintenance capabilities of the system include: whether the state has its own system of monitoring indicators, whether there is business operation log, water and other easy to perform analysis for troubleshooting, maintenance, upgrades, off the assembly line, the ability to quickly, whether the system has the ability to update the gray and many more.

 

to sum up

  Workflow scheduling system as a core component of large data development platform, involve numerous peripheral systems, business logic itself is very complex, depending on the different scene complexity and focus targeting of the market there are many open-source programs .

  But because of the high complexity of its importance and the business environment, most companies have the ability to develop, or will the secondary development of a self-study or even multiple systems to support their business needs.

Guess you like

Origin www.cnblogs.com/muzhongjiang/p/12641027.html