Distributed Task Scheduling

Zeus is a complete Hadoop job platform

Zeus supports the entire life cycle of a task, from the debug run of a Hadoop task to the periodic scheduling of a production task

Functionally, it supports:

Debug run of Hadoop MapReduce tasks

Debug run of Hive task

Running Shell Tasks

Visual query and data preview of Hive metadata

Automatic scheduling of Hadoop tasks

Complete document management

 

Zeus is open source, not only open source technology, but also open source products.

 

 

 

Course introduction: Detailed explanation of Hadoop job platform Zeus

 

Course Outline:

Introduction to zeus

zeus architecture

Comparison of zeus with other scheduling systems

zeus2 with yarn support

Precautions for using zeus

Follow-up plans for zeus2

 

【Suitable for groups】:

1. System architect, system analyst, senior programmer, senior developer.  

2. The person in charge of data center operation, planning and design involving big data processing.  

3. Heads of government agencies, financial insurance, mobile and Internet sources of big data.  

4. Project leaders of universities and research institutes involved in big data and distributed data processing.  

5. Data warehouse managers, modelers, analysts and developers, system administrators, database administrators, and others interested in data warehouses. 

 

 

The following is the video process QA:

 

Is this similar to tws scheduling?

Answer: I don't know much about tws, and I don't really ask for it. It's similar to oozie.

 

Is Zeus also an open source component of apache? Where is the code hosted?

Answer: It's not from Apache, it's from Ali. The github address is https://github.com/alibaba/zeus

 

Will the worker continue to execute the job after the master hangs up?

Answer: Worker will kill its own task and then connect to the new Master

 

What role does zookeeper play in it?

Answer: It is mainly for notification of task failure, not necessary

 

Does taobao not use this? It hasn't been updated on github for a year? Zeus is mainly doing that task in Ali?

Answer: Taobao has been used as far as I know, the code has not been updated, all have a new version of zeus2: https://github.com/michael8335/zeus2

 

It seems that Taobao has an open source project tbschedule task scheduling system. What is the difference between this and this?

Answer: tbschedule is also a batch scheduling engine, but zeus is more focused on hadoop

 

Workers compete for distributed locks, will they deadlock?

Answer: No, atomic operation

 

Can you give a practical application example of Zeus?

Answer: Many companies use it for hadoop cluster scheduling, the most commonly used are MR and Hive

 

Is it better to use zeus or zues2?

Answer: This is still based on the actual situation. If it is hadoop1, it is best to use zeus directly. If it is hadoop2, I personally recommend using zeus2

 

Where is the task list of all currently executing workers stored? If the current master is down, how can the new master get it and re-send tasks?

Answer: Every key point of the task will be recorded in the database, and the new Master can be obtained directly from the database

 

How does the new Master know all the tasks being executed before, and then issue them?

Answer: The new Master can obtain the executing tasks from the task history table of the database

 

Does Zeus' management and scheduling of algorithms support the simulation results of sample data? Because the difference between algorithm scenarios and efficiency is still relatively large

Answer: zeus is just a workflow engine, the specific algorithm is its own job implementation

 

How big is the application scale of zeus in Taobao? Please introduce the background process of Zeus' birth and development.

Answer: The scale of this application is inconvenient to say, the background is mainly to provide friendly scheduling management for hadoop clusters

zeus vs azkaban vs oozie?

Answer: They are all workflow engines of hadoop clusters

 

Using Zeus' task scheduling to run HiveQL sometimes encounters the situation that the hive table or the jar package cannot be found, but the manual rerun can be executed again. What's going on?

Answer: This is the reason why the environment variable is not configured correctly

 

Does zeus support yarn? Would like to ask what bugs exist in Zeus 1 now?

Answer: zeus1 does not support, zeus2 supports, the specific bug can be viewed in https://github.com/michael8335/zeus2 wiki

 

Is there any connection between Zeus' master and yearn's ResourceManager?

Answer: No

 

When the company uses Zeus task scheduling, there are cases where tasks enter the task queue and are not executed from time to time, and then they can only restart Zeus. This is also a bug of Zeus 1, right?

A: This needs to be analyzed in detail, you can contact me privately

 

Can you connect to Hadoop 2.4 now? When will hive0.13 be supported

A: No, it is not necessary for the time being

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327096243&siteId=291194637