DolphinScheduler source code analysis of Master fault-tolerant processing flow

Click the blue word above to follow  Apache DolphinScheduler

Apache DolphinScheduler (incubating), referred to as "DS", the Chinese name is "Dolphin Scheduling" (the dolphin is smart, humane, and the left and right brains can switch shifts with each other, so they don't need to sleep for a lifetime). DolphinScheduler is trying to be an "out-of-the-box" flexible and easy-to-use big data task scheduling system just like its name.

Official website address: https://dolphinscheduler.apache.org/

The sharing I bring to you today is the master fault-tolerant processing flow of DolphinScheduler source code analysis

The master fault-tolerant process is as follows :

1. When ZooKeeper detects that a Master node is down, it will notify other Masters for fault tolerance

2. The Master that receives the notification will "grab" the fault-tolerant operation through the distributed lock, and the Master that gets the lock starts to perform fault-tolerant processing

3. The fault-tolerant process is as follows:

    3.1 Obtain the list of ProcessInstances that need to be fault-tolerant through the offline Master address and the running workflow state array,

    3.2 Traverse the list to process each workflow:

        3.2.1 Set the Host of the workflow (that is, which Master is responsible for scheduling) to empty, and update it to the database

        3.2.2 Construct a Command of type RECOVER_TOLERANCE_FAULT_PROCESS and insert it into the Command table

The basic process of Master scheduling workflow :

In the MasterSchedulerService thread, it will try to acquire a distributed lock. After getting the lock, it will go to the database to get a Command to execute. If the fault-tolerant Command is successfully inserted into the table, it will be obtained by any Master in the following operations and executed. to process.

The above is my analysis of DolphinScheduler's Master fault-tolerant processing. Welcome to correct me.

notice

For better scalability and performance, the refactoring discussion of DolphinScheduler will start at 19:00 next Friday night, and interested partners are welcome to participate

Did you know?

There are many ways to participate in the DolphinScheduler community, including documentation, translation, Q&A, testing, code, sermons, etc., and the community puts documentation contributions first. In addition, various practical articles are also very welcome. The DolphinScheduler open source community is very Looking forward to your participation.

Contributing the first PR (documentation, code) We also hope that it is simple. Imagine how much psychological damage will be caused to the partners participating in the review if a newcomer contributes a PR with dozens of files changed as soon as he comes up. ????

Document github address: https://github.com/apache/incubator-dolphinscheduler-website

Of course, if you love coding, the community also welcomes "show me the code".

Click the original text and immediately go to 

DolphinScheduler's github warehouse to play together, it is also good to come to a star collection first~

Guess you like

Origin blog.csdn.net/DolphinScheduler/article/details/109882474