Apache's First Asian Online Summit: Workflow & Data Governance Session

background

Big data has been developed for 10 years and has penetrated into various industries. Data needs

demand more and more, which makes big data

The dependencies between businesses are becoming more and more complex. In addition, I also believe that data partners must have a hard time with how to manage data. Coupled with the requirements of the current cloud-native era, how can we handle big data tasks better and easier? relationship and better implementation of data governance?

The closely related projects under Apache include Apache DolphinScheduler, Apache Atlas, Apache Airflow, Apache Oozie, and Apache Griffin. In addition, we also invited partners from the Apache Hudi community, a very popular data lake framework, to share the "Practice of Dolphin Scheduler Based on Apache Hudi Data Lake", so stay tuned. First, let me introduce the annual event of the Apache Foundation

                   ApacheCon

                                               @Official Global Conference Series  

ApacheCon is the official global conference series of the Apache Software Foundation (ASF), held annually. As a prestigious open source feast, it is one of the most anticipated conferences in the open source industry.

Since its inception in 1998, ApacheCon has attracted more than 350 technical projects and different communities to participate in it. It brings together industry experts and teachers at home and abroad to share the latest technological trends and practices in the world, and discuss "tomorrow's technology" together, so that the majority of Technology enthusiasts see the latest trends and progress of various technological frontiers, and better upgrade their technology stacks.

This year is the first time for the organizing committee to hold an online ApacheCon conference for the Asia-Pacific region: ApacheCon Asia. The Asia conference will divide 140+ topics from China, Japan, India, the United States and other countries into 14 forums, including big data, Incubator, API/Microservice, Internet, integration, and open source culture.

About Workflow/Data GovernanceWorkflow and Data Governance Forum

Workflow and data governance schedule and process complex data processing in an orderly manner, and manage and control metadata, blood relationship, and data quality. Various projects in ASF provide various data workflow solutions, such as Apache DolphinScheduler, Apache Airflow, Apache Oozie, while Apache Atlas and Apache Griffin provide various metadata and data quality management. In the topic of workflow and data governance, you will not only learn about the practical experience of front-line users in applying these Apache projects to specific projects, but also learn about the latest progress in the ecology of these Apache projects. At the same time, you will also look forward to the future of data scheduling and data governance. vision.

Producer

Guo Wei  

Apache Member & Apache DolphinScheduler PMC 

August 7-8 Agenda Highlights

@ Apache  

WORKFLOW/DATA GOVERNANCE

Practice of DolphinScheduler scheduling tool in operators 

Sharing guest : Wang Xingjie

Time : August 7th at 13:30

Topic introduction:

We chose DolphinScheduler, an open source scheduling system that is easier to expand, has a good fault tolerance mechanism, and has a very active community. We will introduce the use of DolphinScheduler scheduling solutions in China Unicom to face >100,000 daily scheduling tasks.

Guest introduction:

Wang Xingjie

After graduating in 2014, he began to engage in software research and development, with 7 years of experience in research and development. Currently, he is mainly responsible for the research and development and migration of China Unicom's big data scheduling system.

Massive complex task scheduling tool -- Apache DolphinScheduler

Sharing Guest: Qiang Guo

Time: August 7th at 14:10

Topic introduction:

Apache DolphinScheduler is a scheduling tool that was born out of the need for stable scheduling of massive complex tasks. This topic will introduce DolphinScheduler around its stability, ease of use and other aspects. At the same time, we will also bring 2.0 - microkernel architecture design. After 2.0, each component of DolphinScheduler will be opened in the form of SPI, and users can quickly realize their own feature requirements on this basis.

Guest introduction: 

Qiang Guo

Apache DolphinScheduler PMC, senior software engineer, good at: network communication, big data processing and computing

Airflow in-depth practice

Sharing guest: Wu Lian

Time: August 7th at 14:50

Topic introduction:

Based on the real case of Airflow platform in Shanghai Shuhe Technology, introduce the practice of Airflow application, operation and maintenance and custom development in complex scenarios:

Challenges of complex scenes: 

  1. How to ensure high availability in cross-cloud distributed deployment;

  2. How to effectively support multiple types of scheduling scenarios;

  3. How to ensure high availability of ETL jobs;

  4. How scheduling governance is carried out;

  5. How to achieve maximum automation;

At the same time for some business needs:

  1. Data analysts have a lot of scheduling needs, and it is difficult to develop DAG Python scripts

  2. The DAG to which a department or individual belongs does not want to be edited, viewed and manually scheduled by other department personnel?

  3. The online approval of jobs in the DAG is low in efficiency and heavy in workload. How to improve efficiency and avoid some non-standardized operations?

    How does the message system trigger job batches?

Share the corresponding optimization plan: 

  1. DAG configuration visualization: DAG parameters are configured on the interface, and DAG files are automatically generated in the background.

  2. DAG permission control: DAG empowerment by department and DAG, distinguishing between reading, writing, and executing.

  3. Job standardization monitoring: Configure detection rules to monitor whether jobs comply with the rules, and execute corresponding prompts.

  4. Event trigger plug-in: Receive various messages such as Sensor jobs and AMQP, and trigger corresponding job execution.

Guest introduction: 

Wu Lian

Shanghai DataSeed information technology big data development engineer, 2 years experience in airflow use, maintenance and development, have a deep understanding of airflow, I hope my experience and understanding can contribute to the airflow open source community.

Practice of Dolphin scheduler on Apache Hudi-based data lake

Sharing guest: Zhao Yuwei

Time: August 7th at 15:30

Topic introduction:

A data lake is an enterprise-level data management platform for analyzing different types of data sources. The data lake architecture ensures the integration of multiple data sources and supports multiple data models to ensure data accuracy. It can meet the needs of real-time analysis, and can also be used as a data warehouse to meet the needs of batch data mining. Therefore, we need an efficient, stable and easily scalable task scheduling system to coordinate the external capabilities of the data lake, such as data ingestion, data storage, data exploration, data discovery, data governance, etc. Here I will share why we chose Apache DolphinScheduler as the task scheduling system, and how we allow data users to easily interact with the data lake without having to pay too much attention to technical details

Guest introduction:

Zhao Yuwei

Engaged in Hadoop-related development work, the current main work direction is the research and development of task scheduling system.

Architecture Evolution of Apache DolphinScheduler, a New Generation Big Data Workflow Scheduling Platform

Sharing guest: Lidong Dai

Time: August 8th at 13:30

Topic introduction:

It mainly includes the following six parts:

First, the introduction of DolphinScheduler

Second, the pain points of the big data workflow scheduling platform

Third, the advantages of DolphinScheduler

Fourth, the architectural evolution from version 1.2 to version 1.3

Fifth, the roadmap & Roadmap of Architecture 2.0

Finally, share some user stories

Guest introduction:

LIDONG DAI

Apache DolphinScheduler PMC Chair & Apache Incubator PMC, 10+ years of big data experience, good at building and optimizing large data platforms

Data Quality Service Practice Based on Apache DolphinScheduler

Sharing guest: Sun Chaohe

Time: August 8 at 14:10

Topic introduction:

This speech is mainly to share the design ideas, implementation methods and how to apply DolphinScheduler-based data quality services in actual scenarios

Guest introduction: 

Sun Chaohe

Has rich experience in big data platform development, loves and actively participates in open source, and is a senior code contributor of DolphinScheduler

Data processing in Kubernetes using Airflow

Sharing guest: Luan Peng

Time: August 8 at 14:50

Topic introduction:

1. Why do we use airflow+K8S

2、airflow oa/rbac/web

3. Airflow runs on docker/docker-compose/k8s

4、airflow kubernetes-operator

5. airflow k8s pod plugin

6. airlfow update friendly

7. Usage in Tencent Music

Guest introduction: 

Luan Peng

Tencent Music Data Center, engaged in the construction of cloud native machine learning platform and data platform related content

Detailed explanation and planning of DolphinScheduler workflow DAG large JSON split 

Sharing guest: lijinyong

Time: August 8 at 15:30

Topic introduction:

At present, the process definition of DolphinScheduler uses a large Json storage, which is not efficient when the task is relatively large. I will introduce our solution to solve this problem, which has also been submitted to Apache DolphinScheduler and will be released in the near future.

Guest introduction: 

lijinyong

An active contributor to the DolphinScheduler community, an open source activist, currently working in the big data department of Zhengcai Cloud, engaged in big data platform architecture work, good at the design and development of big data platforms and data warehouse tools, online troubleshooting, etc.

See you at  Workflow and Data Governance!!!

ways of registration

ApacheCon Asia 2021

August 6-8  

14 forums, 100+ technical projects

140+ topic speeches

Online dialogue with global technology experts and experts

3 days of all-weather exchange event

Free to attend

ApacheCon Asia's first online virtual conference

August 6-8, 2021

looking forward to the arrival of friends

Click on the original text to sign up

ApacheCon Asia 2021

https://www.apachecon.com/acasia2021/

Click to read the original text, you can sign up , light it up and watch, you are the best

Guess you like

Origin blog.csdn.net/DolphinScheduler/article/details/119259610