Big data experts teach you how to build a real-time data lake

7c635792e29a6ee7dd8f8007dcdf526c.png

Data processing technology solves the needs of massive storage and analysis for businesses in all walks of life, but the explosive growth of data volume and the continuous enrichment of data types have put forward higher requirements for data processing technology and timeliness, which makes general Technologies such as computing engines (such as Spark and Flink), interactive analysis systems (such as ClickHouse), and data lake frameworks (such as Iceberg) are developing rapidly.

As a professional developer community, DEEPNOVA is committed to promoting technical exchanges, expanding technical horizons, establishing a technical ecology, and actively embracing open source communities, conducting in-depth research on open source technologies such as new generation data lakes and real-time data warehouses, and optimizing some functions. .

In order to better conduct technical discussions and exchanges with developers, from 14:00 to 17:30 on April 16 , DEEPNOVA and the Iceberg community jointly launched "DEEPNOVA MEETUP Online". The theme of this event is "Building a Real-time Data Lake Based on Iceberg", which brings together the strength of the DEEPNOVA community expert group, and will lead the audience to understand the complete history of Iceberg technology development and its application and practice in localized data, so as to truly integrate high-quality technical content. Give back to the community.

1

core content

1. Technical Interpretation: "Apache Iceberg Past, Present and Future"

Sharing guest : Apache Iceberg and HBase PMC member Hu Zheng

Highlights : Apache Iceberg, as an open and standardized data lake table format, has been selected and applied by many domestic and foreign manufacturers. Recently Apache Iceberg plans to launch a commercial version of the data lake storage service on AWS. At the same time, Snowflake, AWS, Cloudera and other companies have released Iceberg data lakes. As it turns out, after several years of development, Apache Iceberg has achieved rapid growth and great success. The content of this sharing includes Iceberg's open source process, as well as the current and future technical directions of key efforts.

2. Technical practice: "NetEase Lake Warehouse Management System Arctic"

Guest speaker : Ma Jin, head of NetEase's data lake and real-time computing team

Highlights : Arctic is an Iceberg-based lake warehouse management system developed by NetEase. At the same time, NetEase has built a stream-batch integrated data production link through Flink and Arctic, a real-time offline unified data warehouse. Based on Iceberg, Arctic has features such as support for primary keys, structure self-optimization, data consistency, real-time subscription and real-time join. This sharing will mainly introduce the core design ideas of Arctic.

3. Technical practice: "Optimization and Practice of Iceberg Index by FastData DLink"

Guest speaker : Zhang Gan, Director of Storage Engine Department of Dipu Technology

Content highlights : Z-Order is a technology that can compress multi-dimensional data into one dimension. It is widely used in spatiotemporal indexes and images, sorts multiple fields, rearranges original data, and reduces unnecessary I/O. This improves the query speed. Based on the primary key deduplication scheme proposed by the Iceberg community, the DEEPNOVA community optimized it with BloomFilter to filter eq-delete files, reduce memory usage, and improve the efficiency of merging small files. This sharing will mainly interpret the optimization capabilities of FastData in indexing technology.

4. Technical practice: "Optimization and Practice of FastData DLink Building Real-time Data Lake Based on Iceberg"

Guest speaker : Jian Yonghua, database kernel development engineer of Dipu Technology

Highlights : The Iceberg CDC capability is the core capability to support the construction of real-time data warehouses. The DEEPNOVA community has fully implemented the Iceberg CDC function, and has realized the rapid migration of Hive historical data into the lake. The method of generating metadata for community PR has been optimized in parallel, and the migration performance has been improved several times. This sharing will focus on how to build a real-time data warehouse and demonstrate the technical advantages of FastData.

44b4eea0343c8700261cb025d994e1f8.png

Listener benefits:

1. Understand the architectural principles, features and application scenarios of Apache Iceberg

2. The technical optimization capability and business value of implementing different functions based on Iceberg

3. An open technical exchange community to discuss Hucang technology with senior technical experts

We firmly believe that the advancement of technology must be the result of the joint efforts of countless technology practitioners, and sincerely hope that the DEEPNOVA community can become a learning and exchange platform for technology enthusiasts, and use the power of more people to build a more complete community. We will also uphold the community spirit of openness and sharing, and give back to the community through more technology sharing, live broadcast events, etc., so that digital technology can bring infinite possibilities to the world.

Welcome to scan the code to watch the live broadcast, forward and share the top three in the live broadcast room invitation list, and will also have DEEPNOVA's exclusive gift box.

8a59bcafd8406f6ae9c88794f6491b55.png

3381b55a218414c8f660780b69842703.png

f7bce02a559867f68bd73e0f79d50dd0.png

 How does the Trino analysis engine perform extremely fast analysis on the data lake?

f0b8b282ae297540eda1122c14ff6928.png

Real-time analytical database DLink supports Lookup join on Iceberg dimension table

c622f0d6a226f54d1132860de0e90efc.png

How to use the integrated lake and warehouse architecture to handle the storage and analysis of multimodal data?

285fdf1a56d2014805a6b46118b093a4.png

What are the advantages of Analytical Database FastData for DLink?

success case

advanced manufacturing    

Chongqing Electromechanical  | Jiuzhou Electric  |  Kelun Pharmaceutical

Government double carbon    

Wisdom Longhua Panzhihua East DistrictShenzhi City

energy mobility    

Changan New Energy  |  Huasheng Group

consumption circulation    

Belle International  |  Show Domain Group 

business integration    

Guangzhou Urban Investment  |  New Hualian Huafa Co. , Ltd.

Wisdom Cultural Tourism    

Nianhuawan Cultural Tourism  |  Da Hengqin Pan-tourism

more industries    

Xinjianyuan Group  |  Tequ Agriculture and Animal Husbandry

Click below to read the original text and unlock the event details

Guess you like

Origin blog.csdn.net/zw0Pi8G5C1x/article/details/124206954