FFA 2023 special interpretation: integration of streaming and batching, platform construction, cloud native

picture

This year Flink Forward Asia (hereinafter referred to as FFA) returns offline and will be held at the Hyatt Regency Wangjing Hotel in Beijing from December 8th to 9th. The Flink Forward Asia 2023 conference agenda is officially online!

Flink Forward is the official technical conference of the Apache Flink community officially authorized by Apache. As one of the most anticipated annual summits among Apache Flink community developers, FFA 2023 will continue to gather industry best practices and the latest technology trends of Flink. It is the most anticipated event for Flink development in China. The most unmissable technological feast for developers and users.

Integrated flow and batch

The special session on integrated streaming and batching will feature technical experts from Alibaba Cloud Intelligence, Xiaohongshu, ByteDance, Xiaomi and other companies to present practical cases of large-scale application of integrated streaming and batching to understand the pain points and thoughts of business in the data warehouse construction process. In addition, senior technical experts from Alibaba Cloud Intelligence will share the concepts and advantages of stream-batch fusion, and introduce the technical challenges and design solutions of stream-batch fusion based on full incremental integrated data processing scenarios, as well as the latest progress and future plans of the Flink community in this regard. .

Apache Flink: stream-batch fusion computing engine

Song Xintong|Alibaba Cloud Intelligence Senior Technical Expert, Flink Distributed Execution Team Leader, Apache Flink PMC

Su Xuannan|Alibaba Cloud Intelligent Senior Development Engineer, Apache Flink Contributor

Topic introduction:

For a long time, stream processing and batch processing have been the two major categories of large-scale data processing technology. With the development of stream-batch integration concepts and technologies, Apache Flink has both stream and batch processing capabilities with a single engine. This makes the distinction between stream and batch processing no longer clear, and a new stream-batch fusion data processing model emerges at the historic moment. In the stream-batch fusion processing mode, Flink no longer specifies stream and batch execution modes for jobs, but adaptively selects and switches execution modes based on the timeliness of data to achieve the effect of reducing latency and improving throughput. In this speech, we will share the concepts and advantages of stream-batch fusion, and introduce the technical challenges and design solutions of stream-batch fusion based on the full incremental integrated data processing scenario, as well as the latest progress and future plans of the Flink community in this regard.

Xiaohongshu’s practical exploration of integrating streaming and batching with near-real-time data warehouse

Tang Yun|Head of Xiaohongshu real-time computing engine team, Apache Flink committer

Topic introduction:

1. Flink is an engine that integrates streaming and batch unified interfaces (the unification of Flink SQL and the unification of DataStream API). Xiaohongshu introduces Flink to generate index data related to search and recommendation, and uses a set of interfaces to complete the unification of streaming and batch processing. Greatly improve development efficiency. We actively upgraded Flink Batch to version 1.17, introduced Apache Celeborn, and solved the deployment method and stability issues of Celeborn in the K8S environment, improving the performance, stability and efficiency of Flink Batch.

2. Cooperate with the Xiaohongshu data lake team and use the capabilities of Flink CDC to real-time the ODS layer, thereby improving the timeliness of the offline data warehouse.

3. Further promote near-real-time in the DWD layer of the offline data warehouse, explore the differences and characteristics of various processing methods such as data lake lookup join, left join, partial update, etc., and implement mechanisms such as mini-Batch join/agg to reduce costs as much as possible , implemented Checkpoint's state schema evolution to optimize data portability, theoretically analyzed the core technical difficulties behind the near-real-time implementation of offline data warehouses, and was optimistic about incremental data warehouse processing based on IVM based on exploration experience. way to break the lambda architecture and truly unify the stream-batch architecture.

The large-scale implementation of streaming and batch integration in ByteDance

Su Dewei|ByteDance Infrastructure Engineer

Topic introduction:

Flink is the de facto standard for stream computing, but it is not widely used in batch computing scenarios. With the improvement of the streaming and batch integration capabilities of the Flink engine, within ByteDance we have migrated 2.2w+ Spark SQL jobs in offline data synchronization scenarios to Flink Batch SQL to promote the implementation of streaming and batch integration. Spark SQL jobs have rich patterns. By solving a large number of Spark SQL compatibility issues and performing data accuracy verification and automated migration, we can smoothly migrate daily routine jobs to Flink Batch and run them stably online. This sharing will mainly introduce the large-scale implementation of streaming and batch integration in ByteDance from the following five parts.

1. The implementation and challenges of streaming and batch integration in ByteDance

2. Improved Spark SQL compatibility

3. Flink Batch performance optimization

4. Migration process and tool support

5. Income and future planning

Xiaomi’s practice of streaming and batching integrated data warehouse based on Flink

Wu Junsheng|Xiaomi Software R&D Engineer

Topic introduction:

This sharing will focus on the construction of an integrated streaming and batching data warehouse for Xiaomi's TV and video business. Based on specific business conditions, Xiaomi's exploration of the streaming and batching integrated data warehouse for its TV and video business will be shared to create an efficient and stable data base. Its content is mainly divided into 4 parts:

1. Data warehouse evolution of Xiaomi TV and video business

2. The construction process of batch-flow integrated data warehouse and the problems and thoughts encountered during the construction process

3. Application scenarios of batch-flow integrated data warehouse

4. Summary and Outlook

Byte full incremental integrated real-time data construction solution

Qin Binglun|ByteDance real-time data engineer

Liu Xiang|ByteDance Real-time Data Engineer

Topic introduction:

Byte's business includes many fully incremental integrated data scenarios, typical scenarios such as user portraits, risk control, real estate, etc. Operations not only require long-term data in the past to explore potential value, but also have a high demand for data freshness. demands, so the full incremental integration of real-time data is of great value. This sharing mainly introduces the challenges and solutions faced in the construction process of full incremental integrated flow computing:

1. The value and challenges of full incremental integrated real-time data

2. Storage construction

3. Calculation Construction

4. Future planning and prospects

Platform construction

The platform construction session will feature technical experts from Alibaba Cloud Intelligence, NetEase, Xiaomi, and SelectDB sharing the evolution and practice of real-time computing platforms based on Apache Flink.

Alibaba Cloud real-time computing Flink productization thinking and practice

Huang Pengcheng|Alibaba Cloud Intelligent Senior Product Expert

Chen Jingmin|Apache Flink Committer Alibaba Cloud Intelligent Technology Expert

Topic introduction:

1. Alibaba Cloud real-time computing Flink product introduction

2. Thoughts on real-time computing functions and scenarios on the cloud

3. Productization Practice

4. Outlook

Alibaba Lingyang’s optimization and practice of real-time computing based on Flink

Wang Liuzhen|Technical expert of Alibaba Cloud Intelligence Group

Topic introduction:

Share and introduce the optimization and practice of real-time computing based on Flink by Alibaba Lingyang over the years. The real-time computing of the Dataphin platform has always supported various BU businesses within the Alibaba Group, such as business consulting on the merchant side, Double Eleven media large screens on the media side, marketing activity analysis on the small and secondary end, etc.; it is later uploaded to the cloud for external output, empowering the enterprise. It can promote the digitalization process of enterprises. The main outline is as follows:

1. The evolution of Alibaba’s real-time computing platform based on Flink

2. Flink capability optimization and construction

3. Best practices based on Flink

4. Future planning

NetEase Interactive Entertainment’s one-stop real-time data mart based on the Flink ecosystem

Lin Jia | Senior development engineer at NetEase Games, Apache Flink Contributor, Flink CDC Contributor

Topic introduction:

With the steady increase in the use of Flink within Interactive Entertainment, more and more businesses are migrating from traditional architecture to real-time. Taking the extremely important billing business as an example, in the past year, the Flink batch process of nearly a thousand offline Spark computing tasks was completed. How to manage thousands of real-time streaming operations and their upstream and downstream associated data assets, and how to allow data users to easily query, transfer, and calculate these data without complicating the entire technology stack is how we build a one-stop solution The original motivation for the data mart product.

This sharing will start from the real needs of data center analysts. From the perspective of user use and product design, we will show how we combine the technical achievements accumulated on Flink with users’ usage methods to create a system they love. A one-stop real-time data mart. Share directory:

  • Let’s start with a need from data analysts

  • Construction of Flink infrastructure

  • One-stop real-time data mart

  • Realizing reliable and energy-efficient real-time data value

Construction practice of Xiaomi Flink real-time computing platform

Chen Zihao|Xiaomi Software R&D Engineer, Apache Flink Contributor

Topic introduction:

This sharing will focus on the construction of a real-time computing platform, combined with Xiaomi's own practical business experience, to share Xiaomi's exploration and construction in the field of real-time computing, and create a unified real-time computing platform with features such as ease of use, low cost, and quality assurance. main content:

1. Introduction to Xiaomi’s real-time computing platform

2. Platform usability capability building

3. Cost management and quality reinforcement

4. Summary and Outlook

Apache Flink’s extreme speed experience and platform practice on StreamPark

Wang Huajie|SelectDB senior architect, Apache StreamPark PPMC member

Topic introduction:

Apache Flink is already the de facto standard for real-time computing and is used on a large scale. However, due to its professionalism, it still faces the problem of high threshold for getting started. Especially in terms of real-time job deployment, management and operation, the Flink community has not solved it well. This is a common problem that enterprises encounter in practice. In this topic, we will discuss how StreamPark thinks and solves this problem, how to seamlessly support each ecological component of Flink to achieve one-stop "coffee management", and then we will introduce how various companies use StremaPark and some Best practices, and finally summarize why StreamPark makes stream processing simpler.

cloud native

The cloud native session invited technical experts such as Alibaba Cloud Intelligence, OPPO, Lalamove, and Yishijie to share the application and practice of Flink multi-cloud architecture.

Serverless Flink multi-cloud architecture practice

Wang Yang | Senior R&D expert at Alibaba Cloud Intelligence, leader of the open source big data Serverless platform team, Apache Flink PMC

Topic introduction:

1. Serverless Flink architecture

2. Core technologies (tenant K8S management and control plane isolation, tenant computing resource isolation, tenant network isolation and connection)

3. Multi-cloud deployment (AWS, AZure, GCP)

4. Future Outlook (BYOC Model)

Apache Celeborn: Help Flink become a better streaming and batch integrated engine

Zhou Keyong | Head of Alibaba Cloud Intelligent EMR Spark Engine, member of Apache Celeborn(Incubating) PPMC

Topic introduction:

1. Challenges faced by Flink Batch on Shuffle

2. How Apache Celeborn improves the stability and performance of Flink Batch

3. Apache Celeborn community today and tomorrow

The evolution of OPPO’s cloud-native real-time computing platform based on Flink

Jiang Long|OPPO Big Data Advanced Research and Development, Apache Flink Contributor

Topic introduction:

1. Current status, architecture and bottlenecks of OPPO real-time computing platform: Detailed introduction to the current status of OPPO real-time computing platform, including the platform’s architectural design and key component functions. At the same time, analyze the bottlenecks faced by the current platform, such as data processing performance, resource utilization, etc., and propose solutions.

2. Core technologies and improvement points for migrating to the cloud: Discuss the core technologies and improvement points adopted by OPPO in the process of migrating the real-time computing platform to the cloud. Including implementation plans, transformation of Flink and Kubernetes resource management and scheduling, smooth elastic scaling mode (scaling is based on CPU, memory, LAG or DS2 algorithm), plug-in historical services and ChatGPT-based exception diagnosis, as well as pre-compiled Deployment acceleration means, etc.

3. Benefits and problem solutions from moving to the cloud: Share the benefits OPPO has gained from moving its real-time computing platform to the cloud, and introduce the problems encountered and corresponding solutions. This includes efforts in offline real-time mixing, peak shaving and valley filling, as well as methods to solve common problems, such as TM heartbeat timeout, single partition delay, automatic node blackout, resource mutual exclusion, etc.

4. Real-time diagnosis of operation and maintenance testing: This article introduces the real-time diagnosis function of operation and maintenance testing of the OPPO real-time computing platform, and how to use this function to quickly locate and solve problems. At the same time, share the open source status of this feature so that other users can also benefit.

5. Future Outlook: Looking forward to the future development of OPPO’s real-time computing platform, it is emphasized that the platform will continue to evolve in a more stable and intelligent direction. Explore possible evolution directions, such as performance optimization, intelligent scheduling, automated operation and maintenance, etc., to meet growing business needs.

Lalamove Flink cloud native application and practice

Wang Shitao|Head of Lalamove big data real-time offline platform

Chen Haiqing | Head of Lalamove big data overseas real-time platform

Topic introduction:

1. How does Flink become cloud native?

1.1 Use and optimize K8S operator to implement Flink on K8S

1.2 K8S cluster/K8S task indicator and log collection, as well as monitoring + scheduling adaptation optimization

1.3 Automatically and quickly switch from YARN to K8S at the task level and cluster level

2. How to separate storage and calculation under Flink cloud native

2.1 Implement remote-statebackend in Redis/HBase mode

2.2 remote-statebackend adaptation optimization, not limited to multi-layer cache optimization, read and write performance optimization under different workloads, remote storage design

2.3 Implement conversion in Redis/HBase statebackend mode and conversion to native statebackend mode

3. Flink cloud native benefits

3.1 Cost and stability benefits

3.2 The application scenario adaptation of remote-statebackend is not limited to state. It can query application scenarios, state can share application scenarios, and state can edit application scenarios.

Flink Kubernetes Operator: Flink’s next stop in cloud native

Chen Zhengyu|Senior big data development engineer at Yishijie Games, Apache Flink/StreamPark Contributor

Topic introduction:

After more than a year of development, Flink Kubernetes Operator now has basic cloud native and automated deployment of Flink functions. This speech will take you into the cloud-native Flink era and describe the work of Flink Kubernetes Operator in cloud-native, including Flink job deployment tracking, automatic tuning, observability and other aspects. It will explain some of the current Flink development work in cloud-native. As well as the Operator's ongoing work and some of the expected features of Flink cloud native in the future.


Flink Forward Asia 2023

▼ Scan the QR code on the official website of the conference to register immediately▼

picture

Click on the topic to view the topic details and lecturer introduction

Flink Forward Asia 2023 Partners

picture

 

Guess you like

Origin blog.csdn.net/weixin_44904816/article/details/134680128