Event Preview|July 29 Streaming Lakehouse Meetup·Beijing Station

Have you ever been mad that data is difficult to enter the lake?

Do you have the trouble of inconsistent batch storage?

Have you ever felt helpless that the timeliness of entering the lake cannot keep up?

Streaming Lakehouse's first Meetup is here!

July 29 | Beijing | Offline

Come and experience the real-time data lake of Streaming Lakehouse!

This Meetup invited seven technical experts from Alibaba and ByteDance to focus on the practice of large-scale CDC data entering the lake, Flink's one-stop integrated lake and warehouse construction, key features of streaming data lakes, and unified RSS, etc. Analyze the most cutting-edge technology and the latest industrial practice of Streaming Lakehouse! Flink, Paimon, Celeborn, Flink CDC, and StarRocks multiple open source projects gather together, what kind of sparks will they collide with? Stay tuned!

Activity Highlights

  • There are a lot of practical dry goods . This meetup will fully share the complete link of Streaming Lakehouse from Flink CDC, Lake Storage Paimon, computing engine Flink, batch processing RSS Celeborn, and OLAP analysis StarRocks. You can get why you need to build Streaming Lakehouse. How to build a low-cost, quasi-real-time Streaming Lakehouse to easily real-time your offline data warehouse!
  • There are various forms of activities , both offline and online. You can participate in offline Meetup face-to-face communication in the same city, and you can also watch the live broadcast online in different places. The exciting content is not to be missed;
  • Rich peripherals are waiting for you . If you sign up, you will have the opportunity to get exquisite peripherals customized by the Flink community and Paimon community!

event agenda

Introduction of guests and topics

Topic 1|Apache Paimon Real-time Data Lake: Storage Base of Streaming Lakehouse

■ Presentation Introduction

At present, mainstream data lake storage projects in the industry are all designed for batch scenarios, which cannot meet the requirements of Streaming Lakehouse in terms of data update processing timeliness. As a real-time data lake, Apache Paimon serves as the storage base of Streaming Lakehouse, unlocking the real-time scenario of offline data and bringing real-time, low-cost Lakehouse.

  • Data Lake 2023: Opportunities and Challenges
  • Paimon Live Updates & Offline View
  • Paimon Changelog and Scenes
  • Paimon ecology
  • Summary and Planning

Topic 2|Efficiently build access to the lake based on Flink CDC

■ Presentation Introduction

The data stored in the database is the most valuable data source for business. How to efficiently ingest this data into the data lake is a very valuable topic. Flink CDC is an open source representative of real-time data integration framework. It not only has technical advantages such as full incremental integration, lock-free reading, concurrent reading, and distributed architecture, but also provides rich SQL processing capabilities. It is very popular in the open source community . Apache Paimon is an emerging data lake project incubated from the Flink community, providing users with high-throughput, low-latency data ingestion, streaming subscription, and real-time query capabilities. When users build a streaming lake warehouse around Paimon, using Flink CDC as the access to the lake can greatly simplify the construction cost of the lake warehouse, and at the same time unlock advanced features such as whole warehouse synchronization and Schema Evolution.

Topic 3|Flink Batch SQL Improvements on Lakehouse

■ Presentation Introduction

In recent versions, the Flink community has been investing a lot of energy in improving and improving batch processing capabilities, so as to make batch processing faster, more stable and easier to use. This includes supporting more API syntax and improving data management capabilities. At the QO level, a dense tree-based Join Reorder algorithm is introduced to improve the performance of multi-table Join; DPP is optimized to cover more business scenarios. At the QE level, the performance of batch processing is greatly improved by introducing functions such as Adaptive Local HashAgg, Runtime Filter, and multi-operator fusion Codegen. In terms of SQL service, Gateway supports JDBC Driver and is compatible with existing job submission modes, making job submission more convenient. Through the above work, Flink batch processing makes the Lakehouse architecture simpler and more efficient, and improves data processing efficiency. In this topic, I will introduce these optimizations and new features, as well as future development plans.

Topic 4|Flink&Paimon-based Streaming Data Warehouse Practice in Happiness

■ Presentation Introduction

Xingfuli business is a typical business scenario of transactions and transactions. This kind of business scenario has encountered many challenges in real-time data warehouse modeling. This sharing mainly introduces the practical experience of Xingfuli business in building a streaming data warehouse based on Flink & Paimon. From the business background, the stream-batch integrated data warehouse architecture, the problems and solutions encountered in practice, and the final results that can be obtained with the help of Paimon Benefits, as well as several aspects of future planning to introduce.

Topic 5|Apache Celeborn: Making Spark and Flink Faster, More Stable, and More Elastic

■ Presentation Introduction

Apache Celeborn (Incubating) is a high-performance, highly available, and scalable general-purpose Shuffle service that supports Spark and Flink, two major engines (more engines such as Tez/MR will be supported in the future). Celeborn supports the production Shuffle of dozens of P per day in Alibaba and many well-known enterprises, improving stability and performance while reducing costs. This sharing will introduce Celeborn's high-performance and high-availability core design, a unified architecture that supports multiple engines, user cases, and how to better participate in the community.

Topic 6|Using Paimon + StarRocks to build an integrated data analysis solution for lakes and warehouses

■ Presentation Introduction

  • Introduce the current mainstream big data analysis technology solutions and the advantages of the lake warehouse integrated data analysis solution
  • Introduce how to use Paimon + StarRocks to build an integrated data analysis system for lakes and warehouses
  • Introduce the technical principle of using StarRocks to analyze Paimon table format
  • Introduce the use of Paimon + StarRocks to build a real-time data warehouse analysis solution and the future planning of the StarRocks community on Paimon

Hands-on practice|Use Flink to discover the hottest GitHub projects in real time

Warm reminder: Students participating in the practical session need to bring a computer!

This Meetup has added a hands-on practice session, teaching you how to use the real-time computing Flink version of the product.

It only takes 5 minutes to quickly find the hottest TOP 10 projects on GitHub, get free 5000CU*H resource packs on the cloud, complete experimental projects on site, and even give away surprise gift packs!

Join  the Flink-Learning training camp and start your real-time computing journey.

Click the link to join the learning immediately: Flink-Learning Training Camp-Alibaba Cloud Developer Community-Alibaba Cloud

Event Details

Time: 13:00-18:30, July 29th

Location: Hyatt Regency Wangjing, Chaoyang District, Beijing

Click the link to watch the online live broadcast: https://gdcop.h5.xeknow.com/sl/2bTgeB

Scan the picture below to participate in the registration:

Click to register now

Guess you like

Origin blog.csdn.net/weixin_44904816/article/details/131587932